同时聚合多个列[重复]

时间:2021-11-05 22:34:44

This question already has an answer here:

这个问题已经有了答案:

I have a data-frame likeso:

我有一个像这样的数据框:

x <-
id1 id2    val1  val2 val3 val4
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8

I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time.

我希望通过id1和id2聚合上面的内容。我想同时得到1,2,3,4的均值。

How do i do this?

我该怎么做呢?

This is what i currently have but it works just for 1 column:

这是我目前所拥有的,但只适用于一栏:

agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean)
names(agg)[3] <- c("val1")  # Rename the column

Also, how do i rename the columns which are outputted as means in the same statement given above

另外,如何在上面给出的相同语句中重命名输出的列

2 个解决方案

#1


16  

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

我们可以用集合的公式法。在~的'rhs'上的变量是分组变量,表示“df1”中的所有其他变量(在本例中,我们假设除了分组外,所有列都需要平均值),指定数据集和函数(平均值)。

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

或者我们可以在分组后使用来自dplyr的汇总_each (group_by)

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

另一个选项是data.table。我们将“data.frame”转换为“data”。表(setDT(df1),按“id1”和“id2”分组,我们对数据子集进行循环。表(. sd)并得到平均值。

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)] 

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

#2


6  

You could try:

你可以试试:

agg <- aggregate(list(x$val1, x$val2, x$val3, x$val4), by = list(x$id1, x$id2), mean)

#1


16  

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

我们可以用集合的公式法。在~的'rhs'上的变量是分组变量,表示“df1”中的所有其他变量(在本例中,我们假设除了分组外,所有列都需要平均值),指定数据集和函数(平均值)。

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

或者我们可以在分组后使用来自dplyr的汇总_each (group_by)

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

另一个选项是data.table。我们将“data.frame”转换为“data”。表(setDT(df1),按“id1”和“id2”分组,我们对数据子集进行循环。表(. sd)并得到平均值。

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)] 

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

#2


6  

You could try:

你可以试试:

agg <- aggregate(list(x$val1, x$val2, x$val3, x$val4), by = list(x$id1, x$id2), mean)