确定用dplyr - R过滤不包括的观测值

时间:2021-08-13 22:25:47

When using dplyr on large dataframes, I often use multiple filtering arguments. Often I could include these all in one filter argument. However, I like the way dplyr allows you to think gradually about what you're doing with data, so often these filters may be on successive lines.

当在大型数据aframes上使用dplyr时,我经常使用多个过滤参数。通常我可以把这些都包含在一个过滤器参数中。然而,我喜欢dplyr允许您逐渐思考如何处理数据的方式,因此这些过滤器常常是连续的。

However, often I want to not only keep the observations produced by these successive filters in a new df, but also the observations from the original df that were not included in a separate df.

然而,通常我不仅希望将这些连续过滤器产生的观测结果保存在一个新的df中,而且还希望保留来自原始df的未包含在单独的df中的观测结果。

For example this dataset:

例如这个数据集:

set.seed(123)
colors<- c( rep("yellow", 5), rep("blue", 5), rep("green", 5) )
shapes<- c("circle", "star", "oblong")
numbers<-sample(1:15,replace=T)
group<-sample(LETTERS, 15, replace=T)
mydf<-data.frame(colors,shapes,numbers,group)
mydf


   colors shapes numbers group
1  yellow circle       5     X
2  yellow   star      12     G
3  yellow oblong       7     B
4  yellow circle      14     I
5  yellow   star      15     Y
6    blue oblong       1     X
7    blue circle       8     S
8    blue   star      14     Q
9    blue oblong       9     Z
10   blue circle       7     R
11  green   star      15     S
12  green oblong       7     O
13  green circle      11     P
14  green   star       9     H
15  green oblong       2     D

Here, let's say I wanted to filter by the following rules (I'm aware that it may make more sense to filter in a different order, e.g. by color first, but for the sake of argument):

在这里,假设我想按照以下规则进行过滤(我知道以不同的顺序进行过滤可能更有意义,例如,先按颜色过滤,但为了讨论):

mydf %>% 
  filter (numbers <= 5 | numbers >= 12) %>% 
  filter (group=="X" | group =="Y" | group == "Z") %>% 
  filter (colors=="yellow")

which returns:

返回:

  colors shapes numbers group
1 yellow circle       5     X
2 yellow   star      15     Y

My question, is how could I keep the 13 observations from the original 'mydf' not returned by the filter into a separate df? Is there a cute dplyr way?

我的问题是,我如何才能将来自原始“mydf”的13个观测结果保留到一个单独的df中?有可爱的dplyr方式吗?

1 个解决方案

#1


2  

I'd suggest

我建议

sepDf <- setdiff(mydf, mydf.filtered)

#1


2  

I'd suggest

我建议

sepDf <- setdiff(mydf, mydf.filtered)