确定用dplyr - R过滤不包括的观测值

时间:2021-08-13 22:25:47

When using dplyr on large dataframes, I often use multiple filtering arguments. Often I could include these all in one filter argument. However, I like the way dplyr allows you to think gradually about what you're doing with data, so often these filters may be on successive lines.


However, often I want to not only keep the observations produced by these successive filters in a new df, but also the observations from the original df that were not included in a separate df.


For example this dataset:


colors<- c( rep("yellow", 5), rep("blue", 5), rep("green", 5) )
shapes<- c("circle", "star", "oblong")
group<-sample(LETTERS, 15, replace=T)

   colors shapes numbers group
1  yellow circle       5     X
2  yellow   star      12     G
3  yellow oblong       7     B
4  yellow circle      14     I
5  yellow   star      15     Y
6    blue oblong       1     X
7    blue circle       8     S
8    blue   star      14     Q
9    blue oblong       9     Z
10   blue circle       7     R
11  green   star      15     S
12  green oblong       7     O
13  green circle      11     P
14  green   star       9     H
15  green oblong       2     D

Here, let's say I wanted to filter by the following rules (I'm aware that it may make more sense to filter in a different order, e.g. by color first, but for the sake of argument):


mydf %>% 
  filter (numbers <= 5 | numbers >= 12) %>% 
  filter (group=="X" | group =="Y" | group == "Z") %>% 
  filter (colors=="yellow")

which returns:


  colors shapes numbers group
1 yellow circle       5     X
2 yellow   star      15     Y

My question, is how could I keep the 13 observations from the original 'mydf' not returned by the filter into a separate df? Is there a cute dplyr way?


1 个解决方案



I'd suggest


sepDf <- setdiff(mydf, mydf.filtered)



I'd suggest


sepDf <- setdiff(mydf, mydf.filtered)