I know this can be achieved with other packages, but I'm trying to do it in data.table
(as it seems to be the fastest for grouping).
我知道这可以通过其他软件包实现,但我正在尝试在data.table中进行(因为它似乎是最快的分组)。
library(data.table)
dt = data.table(a=c(1,2,2,3))
dt[,length(a),by=a]
results in
结果是
a V1
1: 1 1
2: 2 1
3: 3 1
whereas
而
df = data.frame(a=c(1,2,2,3))
ddply(df,.(a),summarise,V1=length(a))
produces
产生
a V1
1 1 1
2 2 2
3 3 1
which is a more sensible results. Just wondering why data.table
is not giving the same results, and how this can be achieved.
这是一个更明智的结果。只是想知道为什么data.table没有给出相同的结果,以及如何实现这一点。
1 个解决方案
#1
16
The data.table way to do this is to use special variable, .N
, which keeps track of the number of rows in the current group. (Other special variables include .SD
, .BY
(in version 1.8.2) and .I
and .GRP
(available from version 1.8.3). All are documented in ?data.table
):
data.table方法是使用特殊变量.N,它跟踪当前组中的行数。 (其他特殊变量包括.SD,.BY(版本1.8.2)和.I和.GRP(版本1.8.3)。所有都记录在?data.table中:
library(data.table)
dt = data.table(a=c(1,2,2,3))
dt[, .N, by = a]
# a N
# 1: 1 1
# 2: 2 2
# 3: 3 1
To see why what you tried didn't work, run the following, checking the value of a
and length(a)
at each browser prompt:
要查看您尝试的原因无效,请运行以下命令,在每个浏览器提示符下检查a和length(a)的值:
dt[, browser(), by = a]
#1
16
The data.table way to do this is to use special variable, .N
, which keeps track of the number of rows in the current group. (Other special variables include .SD
, .BY
(in version 1.8.2) and .I
and .GRP
(available from version 1.8.3). All are documented in ?data.table
):
data.table方法是使用特殊变量.N,它跟踪当前组中的行数。 (其他特殊变量包括.SD,.BY(版本1.8.2)和.I和.GRP(版本1.8.3)。所有都记录在?data.table中:
library(data.table)
dt = data.table(a=c(1,2,2,3))
dt[, .N, by = a]
# a N
# 1: 1 1
# 2: 2 2
# 3: 3 1
To see why what you tried didn't work, run the following, checking the value of a
and length(a)
at each browser prompt:
要查看您尝试的原因无效,请运行以下命令,在每个浏览器提示符下检查a和length(a)的值:
dt[, browser(), by = a]