计算每个组中的行数

时间:2021-07-11 09:11:30

I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate function to sum data as follows:

我有一个dataframe,我想计算每个组中的行数。我经常使用聚合函数来求和数据,如下所示:

df2 <- aggregate(x ~ Year + Month, data = df1, sum)

Now, I would like to count observations but can't seem to find the proper argument for FUN. Intuitively, I thought it would be as follows:

现在,我想数数观察结果,但似乎找不到合适的理由来取乐。直觉上,我认为应该是这样的:

df2 <- aggregate(x ~ Year + Month, data = df1, count)

But, no such luck.

但是,没有这样的运气。

Any ideas?

什么好主意吗?


Some toy data:

一些玩具的数据:

set.seed(2)
df1 <- data.frame(x = 1:20,
                  Year = sample(2012:2014, 20, replace = TRUE),
                  Month = sample(month.abb[1:3], 20, replace = TRUE))

11 个解决方案

#1


36  

There is also df2 <- count(x, c('Year','Month')) (plyr package)

还有df2 <- count(x, c('Year','Month')) (plyr包)

#2


51  

Following @Joshua's suggestion, here's one way you might count the number of observations in your df dataframe where Year = 2007 and Month = Nov (assuming they are columns):

根据@Joshua的建议,这里有一种方法可以计算df dataframe中的观察数,其中Year = 2007和Month = Nov(假设它们是列):

nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])

and with aggregate, following @GregSnow:

加上聚合,@GregSnow:

aggregate(x ~ Year + Month, data = df, FUN = length)

#3


27  

We can also use dplyr.

我们也可以使用dplyr。

First, some data:

首先,一些数据:

df <- data.frame(x = rep(1:6, rep(c(1, 2, 3), 2)), year = 1993:2004, month = c(1, 1:11))

Now the count:

现在的数:

library(dplyr)
count(df, year, month)
#piping
df %>% count(year, month)

We can also use a slightly longer version with piping and the n() function:

我们还可以使用稍微长一点的带有管道和n()函数的版本:

df %>% 
  group_by(year, month) %>%
  summarise(number = n())

or the `tally function:

或者“统计功能:

df %>% 
  group_by(year, month) %>%
  tally()

#4


25  

An old question without a data.table solution. So here goes...

一个没有数据的老问题。表解决方案。这里是……

Using .N

使用新

library(data.table)
DT <- data.table(df)
DT[, .N, by = list(year, month)]

#5


20  

The simple option to use with aggregate is the length function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) ).

与聚合一起使用的简单选项是长度函数,它将给出子集中向量的长度。有时使用函数(x)和(! = .na(x))更健壮一些。

#6


16  

Create a new variable Count with a value of 1 for each row:

为每一行创建一个值为1的新变量计数:

df1["Count"] <-1

Then aggregate dataframe, summing by the Count column:

然后聚合dataframe,用Count列求和:

df2 <- aggregate(df1[c("Count")], by=list(year=df1$year, month=df1$month), FUN=sum, na.rm=TRUE)

#7


14  

An alternative to the aggregate() function in this case would be table() with as.data.frame(), which would also indicate which combinations of Year and Month are associated with zero occurrences

在这种情况下,聚合()函数的另一种替代方法是表()和asn .data.frame(),它还将指示哪些年和月的组合与零出现关联

df<-data.frame(x=rep(1:6,rep(c(1,2,3),2)),year=1993:2004,month=c(1,1:11))

myAns<-as.data.frame(table(df[,c("year","month")]))

And without the zero-occurring combinations

没有零发生的组合。

myAns[which(myAns$Freq>0),]

#8


2  

For my aggregations I usually end up wanting to see mean and "how big is this group" (a.k.a. length). So this is my handy snippet for those occasions;

对于我的聚合来说,我通常会想要看到“这个群体有多大”(也就是长度)。这是我在这些场合的有用片段;

agg.mean <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="mean")
agg.count <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="length")
aggcount <- agg.count$columnToMean
agg <- cbind(aggcount, agg.mean)

#9


0  

Considering @Ben answer, R would throw an error if df1 does not contain x column. But it can be solved elegantly with paste:

考虑到@Ben回答,如果df1不包含x列,R将抛出一个错误。但可以用膏体优雅地解决:

aggregate(paste(Year, Month) ~ Year + Month, data = df1, FUN = NROW)

Similarly, it can be generalized if more than two variables are used in grouping:

同样,如果在分组中使用两个以上的变量,也可以推广:

aggregate(paste(Year, Month, Day) ~ Year + Month + Day, data = df1, FUN = NROW)

#10


0  

A solution using sqldf package:

使用sqldf包的sql解决方案:

library(sqldf)
sqldf("SELECT Year, Month, COUNT(*) as Freq
       FROM df1
       GROUP BY Year, Month")

#11


-1  

lw<- function(x){length(which(df$variable==someValue))}

agg<- aggregate(Var1~Var2+Var3, data=df, FUN=lw)

names(agg)<- c("Some", "Pretty", "Names", "Here")

View(agg)

#1


36  

There is also df2 <- count(x, c('Year','Month')) (plyr package)

还有df2 <- count(x, c('Year','Month')) (plyr包)

#2


51  

Following @Joshua's suggestion, here's one way you might count the number of observations in your df dataframe where Year = 2007 and Month = Nov (assuming they are columns):

根据@Joshua的建议,这里有一种方法可以计算df dataframe中的观察数,其中Year = 2007和Month = Nov(假设它们是列):

nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])

and with aggregate, following @GregSnow:

加上聚合,@GregSnow:

aggregate(x ~ Year + Month, data = df, FUN = length)

#3


27  

We can also use dplyr.

我们也可以使用dplyr。

First, some data:

首先,一些数据:

df <- data.frame(x = rep(1:6, rep(c(1, 2, 3), 2)), year = 1993:2004, month = c(1, 1:11))

Now the count:

现在的数:

library(dplyr)
count(df, year, month)
#piping
df %>% count(year, month)

We can also use a slightly longer version with piping and the n() function:

我们还可以使用稍微长一点的带有管道和n()函数的版本:

df %>% 
  group_by(year, month) %>%
  summarise(number = n())

or the `tally function:

或者“统计功能:

df %>% 
  group_by(year, month) %>%
  tally()

#4


25  

An old question without a data.table solution. So here goes...

一个没有数据的老问题。表解决方案。这里是……

Using .N

使用新

library(data.table)
DT <- data.table(df)
DT[, .N, by = list(year, month)]

#5


20  

The simple option to use with aggregate is the length function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) ).

与聚合一起使用的简单选项是长度函数,它将给出子集中向量的长度。有时使用函数(x)和(! = .na(x))更健壮一些。

#6


16  

Create a new variable Count with a value of 1 for each row:

为每一行创建一个值为1的新变量计数:

df1["Count"] <-1

Then aggregate dataframe, summing by the Count column:

然后聚合dataframe,用Count列求和:

df2 <- aggregate(df1[c("Count")], by=list(year=df1$year, month=df1$month), FUN=sum, na.rm=TRUE)

#7


14  

An alternative to the aggregate() function in this case would be table() with as.data.frame(), which would also indicate which combinations of Year and Month are associated with zero occurrences

在这种情况下,聚合()函数的另一种替代方法是表()和asn .data.frame(),它还将指示哪些年和月的组合与零出现关联

df<-data.frame(x=rep(1:6,rep(c(1,2,3),2)),year=1993:2004,month=c(1,1:11))

myAns<-as.data.frame(table(df[,c("year","month")]))

And without the zero-occurring combinations

没有零发生的组合。

myAns[which(myAns$Freq>0),]

#8


2  

For my aggregations I usually end up wanting to see mean and "how big is this group" (a.k.a. length). So this is my handy snippet for those occasions;

对于我的聚合来说,我通常会想要看到“这个群体有多大”(也就是长度)。这是我在这些场合的有用片段;

agg.mean <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="mean")
agg.count <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="length")
aggcount <- agg.count$columnToMean
agg <- cbind(aggcount, agg.mean)

#9


0  

Considering @Ben answer, R would throw an error if df1 does not contain x column. But it can be solved elegantly with paste:

考虑到@Ben回答,如果df1不包含x列,R将抛出一个错误。但可以用膏体优雅地解决:

aggregate(paste(Year, Month) ~ Year + Month, data = df1, FUN = NROW)

Similarly, it can be generalized if more than two variables are used in grouping:

同样,如果在分组中使用两个以上的变量,也可以推广:

aggregate(paste(Year, Month, Day) ~ Year + Month + Day, data = df1, FUN = NROW)

#10


0  

A solution using sqldf package:

使用sqldf包的sql解决方案:

library(sqldf)
sqldf("SELECT Year, Month, COUNT(*) as Freq
       FROM df1
       GROUP BY Year, Month")

#11


-1  

lw<- function(x){length(which(df$variable==someValue))}

agg<- aggregate(Var1~Var2+Var3, data=df, FUN=lw)

names(agg)<- c("Some", "Pretty", "Names", "Here")

View(agg)