This question already has an answer here:
这个问题已经有了答案:
- Count number of rows within each group 10 answers
- 每组10个答案中的行数。
I have a data frame in R
like this:
我有一个像这样的数据帧:
ID MONTH-YEAR VALUE
110 JAN. 2012 1000
111 JAN. 2012 2000
. .
. .
121 FEB. 2012 3000
131 FEB. 2012 4000
. .
. .
So, for each month of each year there are n
rows and they can be in any order(mean they all are not in continuity and are at breaks). I want to calculate how many rows are there for each MONTH-YEAR
i.e. how many rows are there for JAN. 2012, how many for FEB. 2012 and so on. Something like this:
因此,每年每个月都有n行,它们可以是任意顺序的(意味着它们都不是连续的,并且处于中断状态)。我想计算每个月有多少行,比如2012年1月有多少行,2012年2月有多少行,等等。是这样的:
MONTH-YEAR NUMBER OF ROWS
JAN. 2012 10
FEB. 2012 13
MAR. 2012 6
APR. 2012 9
I tried to do this:
我试着这样做:
n_row <- nrow(dat1_frame %.% group_by(MONTH-YEAR))
but it does not produce the desired output.How can I do that?
但它不会产生期望的输出。我该怎么做呢?
8 个解决方案
#1
26
Here's an example that shows how table(.)
(or, more closely matching your desired output, data.frame(table(.))
does what it sounds like you are asking for.
这里有一个示例,它显示了表(.)(或者,更接近您想要的输出,data.frame(table(.))所做的事情,它听起来像您所要求的。
Note also how to share reproducible sample data in a way that others can copy and paste into their session.
还请注意如何以其他人可以复制和粘贴到会话的方式共享可复制的示例数据。
Here's the (reproducible) sample data:
这是(可复制的)样本数据:
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
mydf
# ID MONTH.YEAR VALUE
# 1 110 JAN. 2012 1000
# 2 111 JAN. 2012 2000
# 3 121 FEB. 2012 3000
# 4 131 FEB. 2012 4000
# 5 141 MAR. 2012 5000
Here's the calculation of the number of rows per group, in two output display formats:
这里是每个组的行数的计算,有两种输出显示格式:
table(mydf$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
# 2 2 1
data.frame(table(mydf$MONTH.YEAR))
# Var1 Freq
# 1 FEB. 2012 2
# 2 JAN. 2012 2
# 3 MAR. 2012 1
#2
29
The count()
function in plyr
does what you want:
plyr中的count()函数可以满足您的要求:
library(plyr)
count(mydf, "MONTH-YEAR")
#3
9
Using the example data set that Ananda dummied up, here's an example using aggregate()
, which is part of core R. aggregate()
just needs something to count as function of the different values of MONTH-YEAR
. In this case, I used VALUE
as the thing to count:
使用Ananda dumup的示例数据集,这里有一个使用聚合()的例子,它是core R. aggregate()的一部分,只是需要一些东西来作为不同值的功能。在这种情况下,我用VALUE作为计算的对象:
aggregate(cbind(count = VALUE) ~ MONTH.YEAR,
data = mydf,
FUN = function(x){NROW(x)})
which gives you..
它给你. .
MONTH.YEAR count
1 FEB. 2012 2
2 JAN. 2012 2
3 MAR. 2012 1
#4
5
library(plyr)
ddply(data, .(MONTH-YEAR), nrow)
This will give you the answer, if "MONTH-YEAR" is a variable. First, try unique(data$MONTH-YEAR) and see if it returns unique values (no duplicates).
这将给你答案,如果“一个月”是一个变量。首先,尝试唯一的(数据$月),看看它是否返回唯一的值(没有重复的值)。
Then above simple split-apply-combine will return what you are looking for.
然后,在简单的分割-应用-组合之上,将返回你正在寻找的东西。
#5
3
Try using the count function in dplyr:
尝试使用dplyr中的count函数:
library(dplyr)
dat1_frame %>%
count(MONTH.YEAR)
I am not sure how you got MONTH-YEAR as a variable name. My R version does not allow for such a variable name, so I replaced it with MONTH.YEAR.
我不知道你是如何以一个月的名字命名的。我的R版本不允许这样的变量名,所以我用一个月来替换它。
As a side note, the mistake in your code was that dat1_frame %.% group_by(MONTH-YEAR)
without a summarise
function returns the original data frame without any modifications. So, you want to use
顺便说一下,您的代码中的错误是dat1_frame %。% group_by(一个月)没有一个摘要函数返回原始数据帧而不作任何修改。所以,你想用。
dat1_frame %>%
group_by(MONTH.YEAR) %>%
summarise(count=n())
#6
2
Just for completion the data.table solution:
只是为了完成数据。表解决方案:
library(data.table)
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
setDT(mydf)
mydf[, .(`Number of rows` = .N), by = MONTH.YEAR]
MONTH.YEAR Number of rows
1: JAN. 2012 2
2: FEB. 2012 2
3: MAR. 2012 1
#7
1
Here is another way of using aggregate
to count rows by group:
这里是另一种使用聚合来计数的方法:
my.data <- read.table(text = '
month.year my.cov
Jan.2000 apple
Jan.2000 pear
Jan.2000 peach
Jan.2001 apple
Jan.2001 peach
Feb.2002 pear
', header = TRUE, stringsAsFactors = FALSE, na.strings = NA)
rows.per.group <- aggregate(rep(1, length(my.data$month.year)),
by=list(my.data$month.year), sum)
rows.per.group
# Group.1 x
# 1 Feb.2002 1
# 2 Jan.2000 3
# 3 Jan.2001 2
#8
0
Suppose we have a df_data data frame as below
假设我们有如下所示的df_data数据帧。
> df_data
ID MONTH-YEAR VALUE
1 110 JAN.2012 1000
2 111 JAN.2012 2000
3 121 FEB.2012 3000
4 131 FEB.2012 4000
5 141 MAR.2012 5000
To count number of rows in df_data grouped by MONTH-YEAR column, you can use:
要计算按月列分组的df_data中的行数,可以使用:
> summary(df_data$`MONTH-YEAR`)
FEB.2012 JAN.2012 MAR.2012
2 2 1
summary function will create a table from the factor argument, then create a vector for the result (line 7 & 8)
summary函数将从factor参数创建一个表,然后为结果创建一个向量(第7行和第8行)
#1
26
Here's an example that shows how table(.)
(or, more closely matching your desired output, data.frame(table(.))
does what it sounds like you are asking for.
这里有一个示例,它显示了表(.)(或者,更接近您想要的输出,data.frame(table(.))所做的事情,它听起来像您所要求的。
Note also how to share reproducible sample data in a way that others can copy and paste into their session.
还请注意如何以其他人可以复制和粘贴到会话的方式共享可复制的示例数据。
Here's the (reproducible) sample data:
这是(可复制的)样本数据:
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
mydf
# ID MONTH.YEAR VALUE
# 1 110 JAN. 2012 1000
# 2 111 JAN. 2012 2000
# 3 121 FEB. 2012 3000
# 4 131 FEB. 2012 4000
# 5 141 MAR. 2012 5000
Here's the calculation of the number of rows per group, in two output display formats:
这里是每个组的行数的计算,有两种输出显示格式:
table(mydf$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
# 2 2 1
data.frame(table(mydf$MONTH.YEAR))
# Var1 Freq
# 1 FEB. 2012 2
# 2 JAN. 2012 2
# 3 MAR. 2012 1
#2
29
The count()
function in plyr
does what you want:
plyr中的count()函数可以满足您的要求:
library(plyr)
count(mydf, "MONTH-YEAR")
#3
9
Using the example data set that Ananda dummied up, here's an example using aggregate()
, which is part of core R. aggregate()
just needs something to count as function of the different values of MONTH-YEAR
. In this case, I used VALUE
as the thing to count:
使用Ananda dumup的示例数据集,这里有一个使用聚合()的例子,它是core R. aggregate()的一部分,只是需要一些东西来作为不同值的功能。在这种情况下,我用VALUE作为计算的对象:
aggregate(cbind(count = VALUE) ~ MONTH.YEAR,
data = mydf,
FUN = function(x){NROW(x)})
which gives you..
它给你. .
MONTH.YEAR count
1 FEB. 2012 2
2 JAN. 2012 2
3 MAR. 2012 1
#4
5
library(plyr)
ddply(data, .(MONTH-YEAR), nrow)
This will give you the answer, if "MONTH-YEAR" is a variable. First, try unique(data$MONTH-YEAR) and see if it returns unique values (no duplicates).
这将给你答案,如果“一个月”是一个变量。首先,尝试唯一的(数据$月),看看它是否返回唯一的值(没有重复的值)。
Then above simple split-apply-combine will return what you are looking for.
然后,在简单的分割-应用-组合之上,将返回你正在寻找的东西。
#5
3
Try using the count function in dplyr:
尝试使用dplyr中的count函数:
library(dplyr)
dat1_frame %>%
count(MONTH.YEAR)
I am not sure how you got MONTH-YEAR as a variable name. My R version does not allow for such a variable name, so I replaced it with MONTH.YEAR.
我不知道你是如何以一个月的名字命名的。我的R版本不允许这样的变量名,所以我用一个月来替换它。
As a side note, the mistake in your code was that dat1_frame %.% group_by(MONTH-YEAR)
without a summarise
function returns the original data frame without any modifications. So, you want to use
顺便说一下,您的代码中的错误是dat1_frame %。% group_by(一个月)没有一个摘要函数返回原始数据帧而不作任何修改。所以,你想用。
dat1_frame %>%
group_by(MONTH.YEAR) %>%
summarise(count=n())
#6
2
Just for completion the data.table solution:
只是为了完成数据。表解决方案:
library(data.table)
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
setDT(mydf)
mydf[, .(`Number of rows` = .N), by = MONTH.YEAR]
MONTH.YEAR Number of rows
1: JAN. 2012 2
2: FEB. 2012 2
3: MAR. 2012 1
#7
1
Here is another way of using aggregate
to count rows by group:
这里是另一种使用聚合来计数的方法:
my.data <- read.table(text = '
month.year my.cov
Jan.2000 apple
Jan.2000 pear
Jan.2000 peach
Jan.2001 apple
Jan.2001 peach
Feb.2002 pear
', header = TRUE, stringsAsFactors = FALSE, na.strings = NA)
rows.per.group <- aggregate(rep(1, length(my.data$month.year)),
by=list(my.data$month.year), sum)
rows.per.group
# Group.1 x
# 1 Feb.2002 1
# 2 Jan.2000 3
# 3 Jan.2001 2
#8
0
Suppose we have a df_data data frame as below
假设我们有如下所示的df_data数据帧。
> df_data
ID MONTH-YEAR VALUE
1 110 JAN.2012 1000
2 111 JAN.2012 2000
3 121 FEB.2012 3000
4 131 FEB.2012 4000
5 141 MAR.2012 5000
To count number of rows in df_data grouped by MONTH-YEAR column, you can use:
要计算按月列分组的df_data中的行数,可以使用:
> summary(df_data$`MONTH-YEAR`)
FEB.2012 JAN.2012 MAR.2012
2 2 1
summary function will create a table from the factor argument, then create a vector for the result (line 7 & 8)
summary函数将从factor参数创建一个表,然后为结果创建一个向量(第7行和第8行)