使用columns_by变量在列中汇总交叉表中的数据

时间:2022-12-22 09:22:30

I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:

我试图在两个变量之间汇总数据,并且汇总的输出非常粗糙(至少在r表中打破多个页面的r笔记本输出中)。我想将一个变量作为摘要输出的行,另一个作为列,然后在实际表中,每个行和列数据组合的均值一些示例数据:

 dat1 <- data.frame(
    category = rep(c("catA", "catB", "catC"), each=4),
    age = sample(1:2,size=4,replace=T),
    value = rnorm(12)
 )

and then I would usually get my summary dataframe like this:

然后我通常会得到这样的摘要数据框:

dat1 %>% group_by(category,age)%>% summarize(mean(value))

which looks like this: 使用columns_by变量在列中汇总交叉表中的数据

看起来像这样:

but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:

但是我的实际数据中的每个变量都有10个以上的级别,因此表格很长而且难以阅读。我更喜欢这样的东西,我用它创建:

dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))

使用columns_by变量在列中汇总交叉表中的数据

There must be a better way than hand-coding means column?

必须有比手动编码更好的方法吗?

1 个解决方案

#1


2  

You just need to use tidyr in addition to do something like this:

除了做这样的事情之外你还需要使用tidyr:

library(dplyr)
library(tidyr)
dat1 %>%
  group_by(category, age) %>%
  summarise(mean = mean(value)) %>%
  spread(age, mean, sep = '')

Output is as follows:

输出如下:

Source: local data frame [3 x 3]
Groups: category [3]

  category      age1      age2
*   <fctr>     <dbl>     <dbl>
1     catA 0.2930104 0.3861381
2     catB 0.5752186 0.1454201
3     catC 1.0845645 0.3117227

#1


2  

You just need to use tidyr in addition to do something like this:

除了做这样的事情之外你还需要使用tidyr:

library(dplyr)
library(tidyr)
dat1 %>%
  group_by(category, age) %>%
  summarise(mean = mean(value)) %>%
  spread(age, mean, sep = '')

Output is as follows:

输出如下:

Source: local data frame [3 x 3]
Groups: category [3]

  category      age1      age2
*   <fctr>     <dbl>     <dbl>
1     catA 0.2930104 0.3861381
2     catB 0.5752186 0.1454201
3     catC 1.0845645 0.3117227