I'm using the data.table
package to speed up some summary statistic collection on a data set.
我使用数据。表包加速数据集中的汇总统计信息收集。
I'm curious if there's a way to group by more than one column. My data looks like this:
我很好奇,是否有一种方法可以将一个以上的列分组。我的数据是这样的:
purchaseAmt adShown url 15.54 00001 150000001 4.82 00002 150000001 157.99 05005 776300044 ... ... ...
I can do something like this:
我可以这样做:
adShownMedian <- df1[,median(purchaseAmt),by="adShown"]
to get each ad's median. How would I do something that combines adShown
and url
?
得到每个广告的中位数。如何将adshow和url结合起来?
I've tried this:
我已经试过这个:
adShownMedian <- df1[,median(purchaseAmt),by=c("adShown","url")]
but no luck.
但没有运气。
Any suggestions?
有什么建议吗?
1 个解决方案
#1
64
Use by=list(adShown,url)
instead of by=c("adShown","url")
使用by=list(adshow,url)代替by=c(“adshow”,“url”)
Example:
例子:
set.seed(007) DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE))library(data.table)DT <- data.table(DF)DT[, Mean:=mean(X), by=list(Y, Z)] X Y Z Mean 1: 1 1 3 1.000000 2: 2 0 1 9.333333 3: 3 0 5 7.400000 4: 4 0 5 7.400000 5: 5 0 5 7.400000 6: 6 1 0 6.000000 7: 7 0 3 7.000000 8: 8 1 2 12.500000 9: 9 0 5 7.40000010: 10 0 2 15.00000011: 11 0 4 14.50000012: 12 0 1 9.33333313: 13 1 1 13.00000014: 14 0 1 9.33333315: 15 0 2 15.00000016: 16 0 5 7.40000017: 17 1 2 12.50000018: 18 0 4 14.50000019: 19 1 5 19.00000020: 20 0 2 15.000000
#1
64
Use by=list(adShown,url)
instead of by=c("adShown","url")
使用by=list(adshow,url)代替by=c(“adshow”,“url”)
Example:
例子:
set.seed(007) DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE))library(data.table)DT <- data.table(DF)DT[, Mean:=mean(X), by=list(Y, Z)] X Y Z Mean 1: 1 1 3 1.000000 2: 2 0 1 9.333333 3: 3 0 5 7.400000 4: 4 0 5 7.400000 5: 5 0 5 7.400000 6: 6 1 0 6.000000 7: 7 0 3 7.000000 8: 8 1 2 12.500000 9: 9 0 5 7.40000010: 10 0 2 15.00000011: 11 0 4 14.50000012: 12 0 1 9.33333313: 13 1 1 13.00000014: 14 0 1 9.33333315: 15 0 2 15.00000016: 16 0 5 7.40000017: 17 1 2 12.50000018: 18 0 4 14.50000019: 19 1 5 19.00000020: 20 0 2 15.000000