I want to sum rows that have the same value in one column:
我想在一列中对具有相同值的行进行求和:
> df <- data.frame("1"=c("a","b","a","c","c"), "2"=c(1,5,3,6,2), "3"=c(3,3,4,5,2))
> df
X1 X2 X3
1 a 1 3
2 b 5 3
3 a 3 4
4 c 6 5
5 c 2 2
For one column (X2), the data can be aggregated to get the sums of all rows that have the same X1 value:
对于一列(X2),可以对数据进行聚合,得到具有相同X1值的所有行之和:
> ddply(df, .(X1), summarise, X2=sum(X2))
X1 X2
1 a 4
2 b 5
3 c 8
How do I do the same for X3 and an arbitrary number of other columns except X1?
对于X3和除X1之外的任意数列,我怎么做?
This is the result I want:
这就是我想要的结果:
X1 X2 X3
1 a 4 7
2 b 5 3
3 c 8 7
4 个解决方案
#1
26
ddply(df, "X1", numcolwise(sum))
see ?numcolwise
for details and examples.
看到了吗?numcolwise的细节和例子。
#2
21
aggregate
can easily do this with the formula interface:
聚合可以通过公式界面很容易做到:
aggregate(. ~ X1, data=df, FUN=sum)
## X1 X2 X3
## 1 a 4 7
## 2 b 5 3
## 3 c 8 7
Equivalently:
等同于:
aggregate(cbind(X2, X3) ~ X1, data=df, FUN=sum)
#3
6
aggregate
is a great function for these sorts of things:
聚合对于这类东西来说是一个很好的函数:
aggregate(df[,-1],df["X1"],sum)
X1 X2 X3
1 a 4 7
2 b 5 3
3 c 8 7
And a base R version of the numcolwise
method from plyr:
以及来自plyr的numcolwise方法的基础R版本:
aggregate(df[,sapply(df,is.numeric)],df["X1"],sum)
#4
5
A data.table
solution for memory efficiency and coding elegance
一个数据。表解决方案的内存效率和编码优雅
library(data.table)
DT <- data.table(df)
DT[, lapply(.SD, sum), by = X1]
.SD
is the subset of the data.table for each group defined by the values of X1
. There are 3 helpful vignettes associated with the data.table
package.
sd是数据的子集。由X1的值定义的每个组的表。有3个有用的小插曲与数据有关。表方案。
#1
26
ddply(df, "X1", numcolwise(sum))
see ?numcolwise
for details and examples.
看到了吗?numcolwise的细节和例子。
#2
21
aggregate
can easily do this with the formula interface:
聚合可以通过公式界面很容易做到:
aggregate(. ~ X1, data=df, FUN=sum)
## X1 X2 X3
## 1 a 4 7
## 2 b 5 3
## 3 c 8 7
Equivalently:
等同于:
aggregate(cbind(X2, X3) ~ X1, data=df, FUN=sum)
#3
6
aggregate
is a great function for these sorts of things:
聚合对于这类东西来说是一个很好的函数:
aggregate(df[,-1],df["X1"],sum)
X1 X2 X3
1 a 4 7
2 b 5 3
3 c 8 7
And a base R version of the numcolwise
method from plyr:
以及来自plyr的numcolwise方法的基础R版本:
aggregate(df[,sapply(df,is.numeric)],df["X1"],sum)
#4
5
A data.table
solution for memory efficiency and coding elegance
一个数据。表解决方案的内存效率和编码优雅
library(data.table)
DT <- data.table(df)
DT[, lapply(.SD, sum), by = X1]
.SD
is the subset of the data.table for each group defined by the values of X1
. There are 3 helpful vignettes associated with the data.table
package.
sd是数据的子集。由X1的值定义的每个组的表。有3个有用的小插曲与数据有关。表方案。