基于组的R中数据帧中的行数(重复)

This question already has an answer here:

这个问题已经有了答案:

Count number of rows within each group 10 answers
每组10个答案中的行数。

I have a data frame in R like this:

我有一个像这样的数据帧:

  ID   MONTH-YEAR   VALUE
  110   JAN. 2012     1000
  111   JAN. 2012     2000
         .         .
         .         .
  121   FEB. 2012     3000
  131   FEB. 2012     4000
         .           .
         .           .

So, for each month of each year there are n rows and they can be in any order(mean they all are not in continuity and are at breaks). I want to calculate how many rows are there for each MONTH-YEAR i.e. how many rows are there for JAN. 2012, how many for FEB. 2012 and so on. Something like this:

因此，每年每个月都有n行，它们可以是任意顺序的(意味着它们都不是连续的，并且处于中断状态)。我想计算每个月有多少行，比如2012年1月有多少行，2012年2月有多少行，等等。是这样的:

 MONTH-YEAR   NUMBER OF ROWS
 JAN. 2012     10
 FEB. 2012     13
 MAR. 2012     6
 APR. 2012     9

I tried to do this:

我试着这样做:

n_row <- nrow(dat1_frame %.% group_by(MONTH-YEAR))

but it does not produce the desired output.How can I do that?

但它不会产生期望的输出。我该怎么做呢?

8 个解决方案

#1

Here's an example that shows how table(.) (or, more closely matching your desired output, data.frame(table(.)) does what it sounds like you are asking for.

这里有一个示例，它显示了表(.)(或者，更接近您想要的输出，data.frame(table(.))所做的事情，它听起来像您所要求的。

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

还请注意如何以其他人可以复制和粘贴到会话的方式共享可复制的示例数据。

Here's the (reproducible) sample data:

这是(可复制的)样本数据:

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
                       MONTH.YEAR = c("JAN. 2012", "JAN. 2012", 
                                      "FEB. 2012", "FEB. 2012", 
                                      "MAR. 2012"), 
                       VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)), 
                  .Names = c("ID", "MONTH.YEAR", "VALUE"), 
                  class = "data.frame", row.names = c(NA, -5L))

mydf
#    ID MONTH.YEAR VALUE
# 1 110  JAN. 2012  1000
# 2 111  JAN. 2012  2000
# 3 121  FEB. 2012  3000
# 4 131  FEB. 2012  4000
# 5 141  MAR. 2012  5000

Here's the calculation of the number of rows per group, in two output display formats:

这里是每个组的行数的计算，有两种输出显示格式:

table(mydf$MONTH.YEAR)
# 
# FEB. 2012 JAN. 2012 MAR. 2012 
#         2         2         1

data.frame(table(mydf$MONTH.YEAR))
#        Var1 Freq
# 1 FEB. 2012    2
# 2 JAN. 2012    2
# 3 MAR. 2012    1

#2

The count() function in plyr does what you want:

plyr中的count()函数可以满足您的要求:

library(plyr)

count(mydf, "MONTH-YEAR")

#3

Using the example data set that Ananda dummied up, here's an example using aggregate(), which is part of core R. aggregate() just needs something to count as function of the different values of MONTH-YEAR. In this case, I used VALUE as the thing to count:

使用Ananda dumup的示例数据集，这里有一个使用聚合()的例子，它是core R. aggregate()的一部分，只是需要一些东西来作为不同值的功能。在这种情况下，我用VALUE作为计算的对象:

aggregate(cbind(count = VALUE) ~ MONTH.YEAR, 
          data = mydf, 
          FUN = function(x){NROW(x)})

which gives you..

它给你. .

  MONTH.YEAR count
1  FEB. 2012     2
2  JAN. 2012     2
3  MAR. 2012     1

#4

library(plyr)
ddply(data, .(MONTH-YEAR), nrow)

This will give you the answer, if "MONTH-YEAR" is a variable. First, try unique(data$MONTH-YEAR) and see if it returns unique values (no duplicates).

这将给你答案，如果“一个月”是一个变量。首先，尝试唯一的(数据$月)，看看它是否返回唯一的值(没有重复的值)。

Then above simple split-apply-combine will return what you are looking for.

然后，在简单的分割-应用-组合之上，将返回你正在寻找的东西。

#5

Try using the count function in dplyr:

尝试使用dplyr中的count函数:

library(dplyr)
dat1_frame %>% 
    count(MONTH.YEAR)

I am not sure how you got MONTH-YEAR as a variable name. My R version does not allow for such a variable name, so I replaced it with MONTH.YEAR.

我不知道你是如何以一个月的名字命名的。我的R版本不允许这样的变量名，所以我用一个月来替换它。

As a side note, the mistake in your code was that dat1_frame %.% group_by(MONTH-YEAR) without a summarise function returns the original data frame without any modifications. So, you want to use

顺便说一下，您的代码中的错误是dat1_frame %。% group_by(一个月)没有一个摘要函数返回原始数据帧而不作任何修改。所以，你想用。

dat1_frame %>%
    group_by(MONTH.YEAR) %>%
    summarise(count=n())

#6

Just for completion the data.table solution:

只是为了完成数据。表解决方案:

library(data.table)

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
                       MONTH.YEAR = c("JAN. 2012", "JAN. 2012", 
                                      "FEB. 2012", "FEB. 2012", 
                                      "MAR. 2012"), 
                       VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)), 
                  .Names = c("ID", "MONTH.YEAR", "VALUE"), 
                  class = "data.frame", row.names = c(NA, -5L))

setDT(mydf)
mydf[, .(`Number of rows` = .N), by = MONTH.YEAR]

   MONTH.YEAR Number of rows
1:  JAN. 2012              2
2:  FEB. 2012              2
3:  MAR. 2012              1

#7

Here is another way of using aggregate to count rows by group:

这里是另一种使用聚合来计数的方法:

my.data <- read.table(text = '
    month.year    my.cov
      Jan.2000     apple
      Jan.2000      pear
      Jan.2000     peach
      Jan.2001     apple
      Jan.2001     peach
      Feb.2002      pear
', header = TRUE, stringsAsFactors = FALSE, na.strings = NA)

rows.per.group  <- aggregate(rep(1, length(my.data$month.year)),
                             by=list(my.data$month.year), sum)
rows.per.group

#    Group.1 x
# 1 Feb.2002 1
# 2 Jan.2000 3
# 3 Jan.2001 2

#8

Suppose we have a df_data data frame as below

假设我们有如下所示的df_data数据帧。

> df_data
   ID MONTH-YEAR VALUE
1 110   JAN.2012  1000
2 111   JAN.2012  2000
3 121   FEB.2012  3000
4 131   FEB.2012  4000
5 141   MAR.2012  5000

To count number of rows in df_data grouped by MONTH-YEAR column, you can use:

要计算按月列分组的df_data中的行数，可以使用:

> summary(df_data$`MONTH-YEAR`)

FEB.2012 JAN.2012 MAR.2012 
   2        2        1

summary function will create a table from the factor argument, then create a vector for the result (line 7 & 8)

summary函数将从factor参数创建一个表，然后为结果创建一个向量(第7行和第8行)

#1