I have a data.table in r
我在r中有一个data.table
col1 col2 col3 col4
1: 5.1 3.5 1.4 setosa
2: 5.1 3.5 1.4 setosa
3: 4.7 3.2 1.3 setosa
4: 4.6 3.1 1.5 setosa
5: 5.0 3.6 1.4 setosa
6: 5.1 3.5 3.4 eer
7: 5.1 3.5 3.4 eer
8: 5.1 3.2 1.3 eer
9: 5.1 3.5 1.5 eer
10: 5.1 3.5 1.4 eer
DT <- structure(list(col1 = c(5.1, 5.1, 4.7, 4.6, 5, 5.1, 5.1, 5.1,
5.1, 5.1), col2 = c(3.5, 3.5, 3.2, 3.1, 3.6, 3.5, 3.5, 3.2, 3.5,
3.5), col3 = c(1.4, 1.4, 1.3, 1.5, 1.4, 3.4, 3.4, 1.3, 1.5, 1.4
), col4 = structure(c(1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c("setosa",
"versicolor", "virginica", "eer"), class = "factor")), .Names = c("col1",
"col2", "col3", "col4"), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
I want to count unique (distinct) combinations of col1
and col2
for each value of col4
.
我想为col4的每个值计算col1和col2的唯一(不同)组合。
Expected output is
预期的产出是
col1 col2 col3 col4 count
1: 5.1 3.5 1.4 setosa 4
2: 5.1 3.5 1.4 setosa 4
3: 4.7 3.2 1.3 setosa 4
4: 4.6 3.1 1.5 setosa 4
5: 5.0 3.6 1.4 setosa 4
6: 5.1 3.5 3.4 eer 2
7: 5.1 3.5 3.4 eer 2
8: 5.1 3.2 1.3 eer 2
9: 5.1 3.5 1.5 eer 2
10: 5.1 3.5 1.4 eer 2
How can I do this in 1 data.table syntax only?
我怎么能只用1 data.table语法呢?
1 个解决方案
#1
15
I had to go through a few attempts first, and ended up with this. Any good?
我不得不先经过几次尝试,然后结束了这一切。好不好?
DT[, count:=nrow(unique(.SD)), by=col4, .SDcols=c("col1","col2")]
DT
col1 col2 col3 col4 count
1: 5.1 3.5 1.4 setosa 4
2: 5.1 3.5 1.4 setosa 4
3: 4.7 3.2 1.3 setosa 4
4: 4.6 3.1 1.5 setosa 4
5: 5.0 3.6 1.4 setosa 4
6: 5.1 3.5 3.4 eer 2
7: 5.1 3.5 3.4 eer 2
8: 5.1 3.2 1.3 eer 2
9: 5.1 3.5 1.5 eer 2
10: 5.1 3.5 1.4 eer 2
>
and the same but faster thanks to Procrastinatus comment below :
同样但更快,感谢Procrastinatus评论如下:
DT[, count:=uniqueN(.SD), by=col4, .SDcols=c("col1","col2")]
#1
15
I had to go through a few attempts first, and ended up with this. Any good?
我不得不先经过几次尝试,然后结束了这一切。好不好?
DT[, count:=nrow(unique(.SD)), by=col4, .SDcols=c("col1","col2")]
DT
col1 col2 col3 col4 count
1: 5.1 3.5 1.4 setosa 4
2: 5.1 3.5 1.4 setosa 4
3: 4.7 3.2 1.3 setosa 4
4: 4.6 3.1 1.5 setosa 4
5: 5.0 3.6 1.4 setosa 4
6: 5.1 3.5 3.4 eer 2
7: 5.1 3.5 3.4 eer 2
8: 5.1 3.2 1.3 eer 2
9: 5.1 3.5 1.5 eer 2
10: 5.1 3.5 1.4 eer 2
>
and the same but faster thanks to Procrastinatus comment below :
同样但更快,感谢Procrastinatus评论如下:
DT[, count:=uniqueN(.SD), by=col4, .SDcols=c("col1","col2")]