I know there is an easy way to do this...but, I can't figure it out.
我知道有一种简单的方法可以做到这一点......但是,我无法弄明白。
I have a dataframe in my R script that looks something like this:
我的R脚本中有一个数据帧,如下所示:
A B C
1.2 4 8
2.3 4 9
2.3 6 0
1.2 3 3
3.4 2 1
1.2 5 1
Note that A, B, and C are column names. And I'm trying to get variables like this:
请注意,A,B和C是列名。而我正试图得到这样的变量:
sum1 <- [the sum of all B values such that A is 1.2]
num1 <- [the number of times A is 1.2]
Any easy way to do this? I basically want to end up with a data frame that looks like this:
有什么简单的方法吗?我基本上想要得到一个如下所示的数据框:
A num totalB
1.2 3 12
etc etc etc
Where "num" is the number of times that particular A value appeared, and "totalB" is the sum of the B values given the A value.
其中“num”是特定A值出现的次数,“totalB”是给定A值的B值之和。
4 个解决方案
#1
14
I'd use aggregate
to get the two aggregates and then merge
them into a single data frame:
我使用aggregate来获取两个聚合,然后将它们合并到一个数据框中:
> df
A B C
1 1.2 4 8
2 2.3 4 9
3 2.3 6 0
4 1.2 3 3
5 3.4 2 1
6 1.2 5 1
> num <- aggregate(B~A,df,length)
> names(num)[2] <- 'num'
> totalB <- aggregate(B~A,df,sum)
> names(totalB)[2] <- 'totalB'
> merge(num,totalB)
A num totalB
1 1.2 3 12
2 2.3 2 10
3 3.4 1 2
#2
4
Here is a solution using the plyr
package
这是使用plyr包的解决方案
plyr::ddply(df, .(A), summarize, num = length(A), totalB = sum(B))
#3
4
Here is a solution using data.table
for memory and time efficiency
这是一个使用data.table的内存和时间效率的解决方案
library(data.table)
DT <- as.data.table(df)
DT[, list(totalB = sum(B), num = .N), by = A]
To subset only rows where C==1
(as per the comment to @aix answer)
仅对C == 1的行进行子集(根据@aix答案的注释)
DT[C==1, list(totalB = sum(B), num = .N), by = A]
#4
1
In dplyr
:
在dplyr中:
library(tidyverse)
A <- c(1.2, 2.3, 2.3, 1.2, 3.4, 1.2)
B <- c(4, 4, 6, 3, 2, 5)
C <- c(8, 9, 0, 3, 1, 1)
df <- data_frame(A, B, C)
df %>%
group_by(A) %>%
summarise(num = n(),
totalB = sum(B))
#1
14
I'd use aggregate
to get the two aggregates and then merge
them into a single data frame:
我使用aggregate来获取两个聚合,然后将它们合并到一个数据框中:
> df
A B C
1 1.2 4 8
2 2.3 4 9
3 2.3 6 0
4 1.2 3 3
5 3.4 2 1
6 1.2 5 1
> num <- aggregate(B~A,df,length)
> names(num)[2] <- 'num'
> totalB <- aggregate(B~A,df,sum)
> names(totalB)[2] <- 'totalB'
> merge(num,totalB)
A num totalB
1 1.2 3 12
2 2.3 2 10
3 3.4 1 2
#2
4
Here is a solution using the plyr
package
这是使用plyr包的解决方案
plyr::ddply(df, .(A), summarize, num = length(A), totalB = sum(B))
#3
4
Here is a solution using data.table
for memory and time efficiency
这是一个使用data.table的内存和时间效率的解决方案
library(data.table)
DT <- as.data.table(df)
DT[, list(totalB = sum(B), num = .N), by = A]
To subset only rows where C==1
(as per the comment to @aix answer)
仅对C == 1的行进行子集(根据@aix答案的注释)
DT[C==1, list(totalB = sum(B), num = .N), by = A]
#4
1
In dplyr
:
在dplyr中:
library(tidyverse)
A <- c(1.2, 2.3, 2.3, 1.2, 3.4, 1.2)
B <- c(4, 4, 6, 3, 2, 5)
C <- c(8, 9, 0, 3, 1, 1)
df <- data_frame(A, B, C)
df %>%
group_by(A) %>%
summarise(num = n(),
totalB = sum(B))