如何删除R中包含少于3行数据的组？ [重复]

This question already has an answer here:

这个问题在这里已有答案:

Subset data frame based on number of rows per group 2 answers

子集数据框基于每组2个答案的行数

I'm using the dplyr package in R and have grouped my data by 3 variables (Year, Site, Brood).

我在R中使用dplyr包,并将我的数据分组为3个变量(Year,Site,Brood)。

I want to get rid of groups made up of less than 3 rows. For example in the following sample I would like to remove the rows for brood '2'. I have a lot of data to do this with so while I could painstakingly do it by hand it would be so helpful to automate it using R.

我想摆脱少于3行的组。例如,在以下示例中,我想删除brood'2'的行。我有很多数据可以做到这一点,所以虽然我可以手工做到这一点,但使用R自动化它会很有帮助。

Year Site Brood Parents
1996 A    1     1  
1996 A    1     1  
1996 A    1     0  
1996 A    1     0  
1996 A    2     1      
1996 A    2     0  
1996 A    3     1  
1996 A    3     1  
1996 A    3     1  
1996 A    3     0  
1996 A    3     1

I hope this makes sense and thank you very much in advance for your help! I'm new to R and * so apologies if the way I've worded this question isn't very good! Let me know if I need to provide any other information.

我希望这是有道理的,并且非常感谢您的帮助!我是R和*的新手如此道歉,如果我说这个问题的方式不是很好!如果我需要提供任何其他信息,请告诉我。

3 个解决方案

#1

One way to do it is to use the magic n() function within filter:

一种方法是在filter中使用magic n()函数:

library(dplyr)

my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))

my_data %>% 
  group_by(Year, Site, Brood) %>% 
  filter(n() >= 3)

The n() function gives the number of rows in the current group (or the number of rows total if there is no grouping).

n()函数给出当前组中的行数(如果没有分组,则为总行数)。

#2

Throwing the data.table approach here to join the party:

在这里投入data.table方法加入聚会:

library(data.table)
setDT(my_data)
my_data[ , if (.N >= 3) .SD, by = .(Year, Site, Brood)]

#3

You can also do this using base R:

您也可以使用基数R执行此操作:

temp <- read.csv(paste(folder,"test.csv", sep=""), head=TRUE, sep=",")
matches <- aggregate(Parents ~ Year + Site + Brood, temp, FUN="length")
temp <- merge(temp, matches, by=c("Year","Site","Brood"))
temp <- temp[temp$Parents.y >= 3, c(1,2,3,4)]

#1