I want to filter data frame x
by including ID
s that contain rows for Hour
that match all values of testVector
.
我想通过包含包含与HourVall的所有值匹配的Hour行的ID来过滤数据框x。
ID <- c('A','A','A','A','A','B','B','B','B','C','C')
Hour <- c('0','2','5','6','9','0','2','5','6','0','2')
x <- data.frame(ID, Hour)
x
ID Hour
1 A 0
2 A 2
3 A 5
4 A 6
5 A 9
6 B 0
7 B 2
8 B 5
9 B 6
10 C 0
11 C 2
testVector <- c('0','2','5')
The solution should yield the following data frame:
解决方案应该产生以下数据框:
x
ID Hour
1 A 0
2 A 2
3 A 5
4 A 6
5 A 9
6 B 0
7 B 2
8 B 5
9 B 6
All values of ID
C were dropped because it was missing Hour
5. Note that I want to keep all values of Hour
for ID
s that match testVector
.
ID C的所有值都被删除,因为它缺少第5小时。请注意,我想保留与testVector匹配的ID的所有小时值。
A dplyr solution would be ideal, but any solution is welcome.
dplyr解决方案是理想的,但欢迎任何解决方案。
Based on other related questions on SO, I'm guessing I need some combination of %in%
and all
, but I can't quite figure it out.
根据关于SO的其他相关问题,我猜我需要%in%和all的组合,但我无法弄明白。
3 个解决方案
#1
2
Here's another dplyr
solution without ever leaving the pipe:
这是另一个没有离开管道的dplyr解决方案:
ID <- c('A','A','A','A','A','B','B','B','B','C','C')
Hour <- c('0','2','5','6','9','0','2','5','6','0','2')
x <- data.frame(ID, Hour)
testVector <- c('0','2','5')
x %>%
group_by(ID) %>%
mutate(contains = Hour %in% testVector) %>%
summarise(all = sum(contains)) %>%
filter(all > 2) %>%
select(-all) %>%
inner_join(x)
## ID Hour
## <fctr> <fctr>
## 1 A 0
## 2 A 2
## 3 A 5
## 4 A 6
## 5 A 9
## 6 B 0
## 7 B 2
## 8 B 5
## 9 B 6
#2
4
Your combination of %in%
and all
sounds promising, in base R you could use those to your advantage as follows:
你的百分比%和所有听起来很有希望的组合,在基础R你可以使用这些有利于你,如下:
to_keep = sapply(lapply(split(x,x$ID),function(x) {unique(x$Hour)}),
function(x) {all(testVector %in% x)})
x = x[x$ID %in% names(to_keep)[to_keep],]
Or similiarly, but skipping an unneccessary lapply and more efficient as per d.b. in the comments:
或者类似地,但是根据d.b.跳过一个不必要的lapply并且更有效率。在评论中:
temp = sapply(split(x, x$ID), function(a) all(testVector %in% a$Hour))
x[temp[match(x$ID, names(temp))],]
Output:
ID Hour
1 A 0
2 A 2
3 A 5
4 A 6
5 A 9
6 B 0
7 B 2
8 B 5
9 B 6
Hope this helps!
希望这可以帮助!
#3
2
Here is an option using table
from base R
这是一个使用基础R表格的选项
i1 <- !rowSums(table(x)[, testVector]==0)
subset(x, ID %in% names(i1)[i1])
# ID Hour
#1 A 0
#2 A 2
#3 A 5
#4 A 6
#5 A 9
#6 B 0
#7 B 2
#8 B 5
#9 B 6
Or this can be done with data.table
或者这可以通过data.table完成
library(data.table)
setDT(x)[, .SD[all(testVector %in% Hour)], ID]
# ID Hour
#1: A 0
#2: A 2
#3: A 5
#4: A 6
#5: A 9
#6: B 0
#7: B 2
#8: B 5
#9: B 6
#1
2
Here's another dplyr
solution without ever leaving the pipe:
这是另一个没有离开管道的dplyr解决方案:
ID <- c('A','A','A','A','A','B','B','B','B','C','C')
Hour <- c('0','2','5','6','9','0','2','5','6','0','2')
x <- data.frame(ID, Hour)
testVector <- c('0','2','5')
x %>%
group_by(ID) %>%
mutate(contains = Hour %in% testVector) %>%
summarise(all = sum(contains)) %>%
filter(all > 2) %>%
select(-all) %>%
inner_join(x)
## ID Hour
## <fctr> <fctr>
## 1 A 0
## 2 A 2
## 3 A 5
## 4 A 6
## 5 A 9
## 6 B 0
## 7 B 2
## 8 B 5
## 9 B 6
#2
4
Your combination of %in%
and all
sounds promising, in base R you could use those to your advantage as follows:
你的百分比%和所有听起来很有希望的组合,在基础R你可以使用这些有利于你,如下:
to_keep = sapply(lapply(split(x,x$ID),function(x) {unique(x$Hour)}),
function(x) {all(testVector %in% x)})
x = x[x$ID %in% names(to_keep)[to_keep],]
Or similiarly, but skipping an unneccessary lapply and more efficient as per d.b. in the comments:
或者类似地,但是根据d.b.跳过一个不必要的lapply并且更有效率。在评论中:
temp = sapply(split(x, x$ID), function(a) all(testVector %in% a$Hour))
x[temp[match(x$ID, names(temp))],]
Output:
ID Hour
1 A 0
2 A 2
3 A 5
4 A 6
5 A 9
6 B 0
7 B 2
8 B 5
9 B 6
Hope this helps!
希望这可以帮助!
#3
2
Here is an option using table
from base R
这是一个使用基础R表格的选项
i1 <- !rowSums(table(x)[, testVector]==0)
subset(x, ID %in% names(i1)[i1])
# ID Hour
#1 A 0
#2 A 2
#3 A 5
#4 A 6
#5 A 9
#6 B 0
#7 B 2
#8 B 5
#9 B 6
Or this can be done with data.table
或者这可以通过data.table完成
library(data.table)
setDT(x)[, .SD[all(testVector %in% Hour)], ID]
# ID Hour
#1: A 0
#2: A 2
#3: A 5
#4: A 6
#5: A 9
#6: B 0
#7: B 2
#8: B 5
#9: B 6