When using dplyr
on large dataframes, I often use multiple filtering arguments. Often I could include these all in one filter
argument. However, I like the way dplyr allows you to think gradually about what you're doing with data, so often these filters may be on successive lines.
当在大型数据aframes上使用dplyr时,我经常使用多个过滤参数。通常我可以把这些都包含在一个过滤器参数中。然而,我喜欢dplyr允许您逐渐思考如何处理数据的方式,因此这些过滤器常常是连续的。
However, often I want to not only keep the observations produced by these successive filters in a new df, but also the observations from the original df that were not included in a separate df.
然而,通常我不仅希望将这些连续过滤器产生的观测结果保存在一个新的df中,而且还希望保留来自原始df的未包含在单独的df中的观测结果。
For example this dataset:
例如这个数据集:
set.seed(123)
colors<- c( rep("yellow", 5), rep("blue", 5), rep("green", 5) )
shapes<- c("circle", "star", "oblong")
numbers<-sample(1:15,replace=T)
group<-sample(LETTERS, 15, replace=T)
mydf<-data.frame(colors,shapes,numbers,group)
mydf
colors shapes numbers group
1 yellow circle 5 X
2 yellow star 12 G
3 yellow oblong 7 B
4 yellow circle 14 I
5 yellow star 15 Y
6 blue oblong 1 X
7 blue circle 8 S
8 blue star 14 Q
9 blue oblong 9 Z
10 blue circle 7 R
11 green star 15 S
12 green oblong 7 O
13 green circle 11 P
14 green star 9 H
15 green oblong 2 D
Here, let's say I wanted to filter by the following rules (I'm aware that it may make more sense to filter in a different order, e.g. by color first, but for the sake of argument):
在这里,假设我想按照以下规则进行过滤(我知道以不同的顺序进行过滤可能更有意义,例如,先按颜色过滤,但为了讨论):
mydf %>%
filter (numbers <= 5 | numbers >= 12) %>%
filter (group=="X" | group =="Y" | group == "Z") %>%
filter (colors=="yellow")
which returns:
返回:
colors shapes numbers group
1 yellow circle 5 X
2 yellow star 15 Y
My question, is how could I keep the 13 observations from the original 'mydf' not returned by the filter into a separate df? Is there a cute dplyr way?
我的问题是,我如何才能将来自原始“mydf”的13个观测结果保留到一个单独的df中?有可爱的dplyr方式吗?
1 个解决方案
#1
2
I'd suggest
我建议
sepDf <- setdiff(mydf, mydf.filtered)
#1
2
I'd suggest
我建议
sepDf <- setdiff(mydf, mydf.filtered)