This question already has an answer here:
这个问题在这里已有答案:
- remove IDs that occur x times R 2 answers
删除出现x次R 2答案的ID
If I have a dataframe like this:
如果我有这样的数据帧:
neu <- data.frame(test1 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14),
test2 = c("a","b","a","b","c","c","a","c","c","d","d","f","f","f"))
neu
test1 test2
1 1 a
2 2 b
3 3 a
4 4 b
5 5 c
6 6 c
7 7 a
8 8 c
9 9 c
10 10 d
11 11 d
12 12 f
13 13 f
14 14 f
and I would like to select only those values where the level of the factor test2
appears more than let's say three times, what would be the fastest way?
而且我想只选择那些因子test2的水平出现超过三次的值,那么最快的方法是什么?
Thanks very much, didn't really find the right answer in the previous questions.
非常感谢,在之前的问题中没有找到正确的答案。
4 个解决方案
#1
7
Find the rows using:
使用以下方法查找行:
z <- table(neu$test2)[table(neu$test2) >= 3] # repeats greater than or equal to 3 times
Or:
z <- names(which(table(neu$test2)>=3))
Then subset with:
然后子集:
subset(neu, test2 %in% names(z))
Or:
neu[neu$test2 %in% names(z),]
#2
5
Here's another way:
这是另一种方式:
with(neu, neu[ave(seq(test2), test2, FUN=length) > 3, ])
# test1 test2
# 5 5 c
# 6 6 c
# 8 8 c
# 9 9 c
#3
3
I'd use count
from the plyr
package to perform the counting:
我将使用plyr包中的count来执行计数:
library(plyr)
count_result = count(neu, "test2")
matching = with(count_result, test2[freq > 3])
with(neu, test1[test2 %in% matching])
[1] 5 6 8 9
#4
2
The (better scaling) data.table
way:
(更好的缩放)data.table方式:
library(data.table)
dt = data.table(neu)
dt[dt[, .I[.N >= 3], by = test2]$V1]
Note: hopefully, in the future, the following simpler syntax will be the fast way of doing this:
注意:希望将来,以下更简单的语法将是快速执行此操作的方法:
dt[, .SD[.N >= 3], by = test2]
(c.f. Subset by group with data.table)
(参见具有data.table的组子集)
#1
7
Find the rows using:
使用以下方法查找行:
z <- table(neu$test2)[table(neu$test2) >= 3] # repeats greater than or equal to 3 times
Or:
z <- names(which(table(neu$test2)>=3))
Then subset with:
然后子集:
subset(neu, test2 %in% names(z))
Or:
neu[neu$test2 %in% names(z),]
#2
5
Here's another way:
这是另一种方式:
with(neu, neu[ave(seq(test2), test2, FUN=length) > 3, ])
# test1 test2
# 5 5 c
# 6 6 c
# 8 8 c
# 9 9 c
#3
3
I'd use count
from the plyr
package to perform the counting:
我将使用plyr包中的count来执行计数:
library(plyr)
count_result = count(neu, "test2")
matching = with(count_result, test2[freq > 3])
with(neu, test1[test2 %in% matching])
[1] 5 6 8 9
#4
2
The (better scaling) data.table
way:
(更好的缩放)data.table方式:
library(data.table)
dt = data.table(neu)
dt[dt[, .I[.N >= 3], by = test2]$V1]
Note: hopefully, in the future, the following simpler syntax will be the fast way of doing this:
注意:希望将来,以下更简单的语法将是快速执行此操作的方法:
dt[, .SD[.N >= 3], by = test2]
(c.f. Subset by group with data.table)
(参见具有data.table的组子集)