I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate
function to sum data as follows:
我有一个dataframe,我想计算每个组中的行数。我经常使用聚合函数来求和数据,如下所示:
df2 <- aggregate(x ~ Year + Month, data = df1, sum)
Now, I would like to count observations but can't seem to find the proper argument for FUN
. Intuitively, I thought it would be as follows:
现在,我想数数观察结果,但似乎找不到合适的理由来取乐。直觉上,我认为应该是这样的:
df2 <- aggregate(x ~ Year + Month, data = df1, count)
But, no such luck.
但是,没有这样的运气。
Any ideas?
什么好主意吗?
Some toy data:
一些玩具的数据:
set.seed(2)
df1 <- data.frame(x = 1:20,
Year = sample(2012:2014, 20, replace = TRUE),
Month = sample(month.abb[1:3], 20, replace = TRUE))
11 个解决方案
#1
36
There is also df2 <- count(x, c('Year','Month'))
(plyr package)
还有df2 <- count(x, c('Year','Month')) (plyr包)
#2
51
Following @Joshua's suggestion, here's one way you might count the number of observations in your df
dataframe where Year
= 2007 and Month
= Nov (assuming they are columns):
根据@Joshua的建议,这里有一种方法可以计算df dataframe中的观察数,其中Year = 2007和Month = Nov(假设它们是列):
nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])
and with aggregate
, following @GregSnow:
加上聚合,@GregSnow:
aggregate(x ~ Year + Month, data = df, FUN = length)
#3
27
We can also use dplyr
.
我们也可以使用dplyr。
First, some data:
首先,一些数据:
df <- data.frame(x = rep(1:6, rep(c(1, 2, 3), 2)), year = 1993:2004, month = c(1, 1:11))
Now the count:
现在的数:
library(dplyr)
count(df, year, month)
#piping
df %>% count(year, month)
We can also use a slightly longer version with piping and the n()
function:
我们还可以使用稍微长一点的带有管道和n()函数的版本:
df %>%
group_by(year, month) %>%
summarise(number = n())
or the `tally function:
或者“统计功能:
df %>%
group_by(year, month) %>%
tally()
#4
25
An old question without a data.table
solution. So here goes...
一个没有数据的老问题。表解决方案。这里是……
Using .N
使用新
library(data.table)
DT <- data.table(df)
DT[, .N, by = list(year, month)]
#5
20
The simple option to use with aggregate
is the length
function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) )
.
与聚合一起使用的简单选项是长度函数,它将给出子集中向量的长度。有时使用函数(x)和(! = .na(x))更健壮一些。
#6
16
Create a new variable Count
with a value of 1 for each row:
为每一行创建一个值为1的新变量计数:
df1["Count"] <-1
Then aggregate dataframe, summing by the Count
column:
然后聚合dataframe,用Count列求和:
df2 <- aggregate(df1[c("Count")], by=list(year=df1$year, month=df1$month), FUN=sum, na.rm=TRUE)
#7
14
An alternative to the aggregate()
function in this case would be table()
with as.data.frame()
, which would also indicate which combinations of Year and Month are associated with zero occurrences
在这种情况下,聚合()函数的另一种替代方法是表()和asn .data.frame(),它还将指示哪些年和月的组合与零出现关联
df<-data.frame(x=rep(1:6,rep(c(1,2,3),2)),year=1993:2004,month=c(1,1:11))
myAns<-as.data.frame(table(df[,c("year","month")]))
And without the zero-occurring combinations
没有零发生的组合。
myAns[which(myAns$Freq>0),]
#8
2
For my aggregations I usually end up wanting to see mean and "how big is this group" (a.k.a. length). So this is my handy snippet for those occasions;
对于我的聚合来说,我通常会想要看到“这个群体有多大”(也就是长度)。这是我在这些场合的有用片段;
agg.mean <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="mean")
agg.count <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="length")
aggcount <- agg.count$columnToMean
agg <- cbind(aggcount, agg.mean)
#9
0
Considering @Ben answer, R would throw an error if df1
does not contain x
column. But it can be solved elegantly with paste
:
考虑到@Ben回答,如果df1不包含x列,R将抛出一个错误。但可以用膏体优雅地解决:
aggregate(paste(Year, Month) ~ Year + Month, data = df1, FUN = NROW)
Similarly, it can be generalized if more than two variables are used in grouping:
同样,如果在分组中使用两个以上的变量,也可以推广:
aggregate(paste(Year, Month, Day) ~ Year + Month + Day, data = df1, FUN = NROW)
#10
0
A sql solution using sqldf
package:
使用sqldf包的sql解决方案:
library(sqldf)
sqldf("SELECT Year, Month, COUNT(*) as Freq
FROM df1
GROUP BY Year, Month")
#11
-1
lw<- function(x){length(which(df$variable==someValue))}
agg<- aggregate(Var1~Var2+Var3, data=df, FUN=lw)
names(agg)<- c("Some", "Pretty", "Names", "Here")
View(agg)
#1
36
There is also df2 <- count(x, c('Year','Month'))
(plyr package)
还有df2 <- count(x, c('Year','Month')) (plyr包)
#2
51
Following @Joshua's suggestion, here's one way you might count the number of observations in your df
dataframe where Year
= 2007 and Month
= Nov (assuming they are columns):
根据@Joshua的建议,这里有一种方法可以计算df dataframe中的观察数,其中Year = 2007和Month = Nov(假设它们是列):
nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])
and with aggregate
, following @GregSnow:
加上聚合,@GregSnow:
aggregate(x ~ Year + Month, data = df, FUN = length)
#3
27
We can also use dplyr
.
我们也可以使用dplyr。
First, some data:
首先,一些数据:
df <- data.frame(x = rep(1:6, rep(c(1, 2, 3), 2)), year = 1993:2004, month = c(1, 1:11))
Now the count:
现在的数:
library(dplyr)
count(df, year, month)
#piping
df %>% count(year, month)
We can also use a slightly longer version with piping and the n()
function:
我们还可以使用稍微长一点的带有管道和n()函数的版本:
df %>%
group_by(year, month) %>%
summarise(number = n())
or the `tally function:
或者“统计功能:
df %>%
group_by(year, month) %>%
tally()
#4
25
An old question without a data.table
solution. So here goes...
一个没有数据的老问题。表解决方案。这里是……
Using .N
使用新
library(data.table)
DT <- data.table(df)
DT[, .N, by = list(year, month)]
#5
20
The simple option to use with aggregate
is the length
function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) )
.
与聚合一起使用的简单选项是长度函数,它将给出子集中向量的长度。有时使用函数(x)和(! = .na(x))更健壮一些。
#6
16
Create a new variable Count
with a value of 1 for each row:
为每一行创建一个值为1的新变量计数:
df1["Count"] <-1
Then aggregate dataframe, summing by the Count
column:
然后聚合dataframe,用Count列求和:
df2 <- aggregate(df1[c("Count")], by=list(year=df1$year, month=df1$month), FUN=sum, na.rm=TRUE)
#7
14
An alternative to the aggregate()
function in this case would be table()
with as.data.frame()
, which would also indicate which combinations of Year and Month are associated with zero occurrences
在这种情况下,聚合()函数的另一种替代方法是表()和asn .data.frame(),它还将指示哪些年和月的组合与零出现关联
df<-data.frame(x=rep(1:6,rep(c(1,2,3),2)),year=1993:2004,month=c(1,1:11))
myAns<-as.data.frame(table(df[,c("year","month")]))
And without the zero-occurring combinations
没有零发生的组合。
myAns[which(myAns$Freq>0),]
#8
2
For my aggregations I usually end up wanting to see mean and "how big is this group" (a.k.a. length). So this is my handy snippet for those occasions;
对于我的聚合来说,我通常会想要看到“这个群体有多大”(也就是长度)。这是我在这些场合的有用片段;
agg.mean <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="mean")
agg.count <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="length")
aggcount <- agg.count$columnToMean
agg <- cbind(aggcount, agg.mean)
#9
0
Considering @Ben answer, R would throw an error if df1
does not contain x
column. But it can be solved elegantly with paste
:
考虑到@Ben回答,如果df1不包含x列,R将抛出一个错误。但可以用膏体优雅地解决:
aggregate(paste(Year, Month) ~ Year + Month, data = df1, FUN = NROW)
Similarly, it can be generalized if more than two variables are used in grouping:
同样,如果在分组中使用两个以上的变量,也可以推广:
aggregate(paste(Year, Month, Day) ~ Year + Month + Day, data = df1, FUN = NROW)
#10
0
A sql solution using sqldf
package:
使用sqldf包的sql解决方案:
library(sqldf)
sqldf("SELECT Year, Month, COUNT(*) as Freq
FROM df1
GROUP BY Year, Month")
#11
-1
lw<- function(x){length(which(df$variable==someValue))}
agg<- aggregate(Var1~Var2+Var3, data=df, FUN=lw)
names(agg)<- c("Some", "Pretty", "Names", "Here")
View(agg)