This must be a duplicate but I can't find it. So here goes.
这必须是重复但我找不到它。所以这里。
I have a data.frame with two columns. One contains a group and the other contains a criterion. A group can contain many different criteria, but only one per row. I want to identify groups that contain three specific criteria (but that will appear on different rows. In my case, I want to identify all groups that contains the criteria "I","E","C". Groups may contain any number and combination of these and several other letters.
我有一个包含两列的data.frame。一个包含一个组,另一个包含一个标准。一个组可以包含许多不同的标准,但每行只能有一个标准。我想识别包含三个特定条件的组(但会出现在不同的行上。在我的情况下,我想识别包含标准“I”,“E”,“C”的所有组。组可以包含任何数字以及这些和其他几个字母的组合。
test <- data.frame(grp=c(1,1,2,2,2,3,3,3,4,4,4,4,4),val=c("C","I","E","I","C","E","I","A","C","I","E","E","A"))
> test
grp val
1 1 C
2 1 I
3 2 E
4 2 I
5 2 C
6 3 E
7 3 I
8 3 A
9 4 C
10 4 I
11 4 E
12 4 E
13 4 A
In the above example, I want to identify grp 2, and 4 because each of these contains the letters E, I, and C.
在上面的例子中,我想识别grp 2和4,因为每个都包含字母E,I和C.
Thanks!
谢谢!
2 个解决方案
#1
2
Here's a dplyr
solution. %in%
is vectorized so c("E", "I", "C") %in% val
returns a logical vector of length three. For the target groups, passing that vector to all()
returns TRUE
. That's our filter, and we run it within each group using group_by()
.
这是一个dplyr解决方案。向量化%%,因此%val中的c(“E”,“I”,“C”)%返回长度为3的逻辑向量。对于目标组,将该向量传递给all()将返回TRUE。这是我们的过滤器,我们使用group_by()在每个组中运行它。
library(dplyr)
test %>%
group_by(grp) %>%
filter(all(c("E", "I", "C") %in% val))
# Source: local data frame [8 x 2]
# Groups: grp [2]
#
# grp val
# (dbl) (fctr)
# 1 2 E
# 2 2 I
# 3 2 C
# 4 4 C
# 5 4 I
# 6 4 E
# 7 4 E
# 8 4 A
Or if this output would be handier (thanks @Frank),
或者如果这个输出更方便(感谢@Frank),
test %>%
group_by(grp) %>%
summarise(matching = all(c("E", "I", "C") %in% val))
# Source: local data frame [4 x 2]
#
# grp matching
# (dbl) (lgl)
# 1 1 FALSE
# 2 2 TRUE
# 3 3 FALSE
# 4 4 TRUE
#2
2
library(data.table)
test <- data.frame(grp=c(1,1,2,2,2,3,3,3,4,4,4,4,4),val=c("C","I","E","I","C","E","I","A","C","I","E","E","A"))
setDT(test) # convert the data.frame into a data.table
group.counts <- dcast(test, grp ~ val) # count number of same values per group and create one column per val with the count in the cell
group.counts[I>0 & E>0 & C>0,] # now filtering is easy
Results in:
结果是:
grp A C E I
1: 2 0 1 1 1
2: 4 1 1 2 1
Instead of returning the group numbers only you could also "join" the resulting group numbers with the original data to show the "raw" data rows of each group that matches:
您可以将结果组号与原始数据“连接”,而不是仅返回组号,以显示匹配的每个组的“原始”数据行:
test[group.counts[I>0 & E>0 & C>0,], .SD, on="grp" ]
This shows:
由此可见:
grp val
1: 2 E
2: 2 I
3: 2 C
4: 4 C
5: 4 I
6: 4 E
7: 4 E
8: 4 A
PS: Just to understand the solution easier: The counts for all groups are:
PS:只是为了更容易理解解决方案:所有组的计数是:
> group.counts
grp A C E I
1: 1 0 1 0 1
2: 2 0 1 1 1
3: 3 1 0 1 1
4: 4 1 1 2 1
#1
2
Here's a dplyr
solution. %in%
is vectorized so c("E", "I", "C") %in% val
returns a logical vector of length three. For the target groups, passing that vector to all()
returns TRUE
. That's our filter, and we run it within each group using group_by()
.
这是一个dplyr解决方案。向量化%%,因此%val中的c(“E”,“I”,“C”)%返回长度为3的逻辑向量。对于目标组,将该向量传递给all()将返回TRUE。这是我们的过滤器,我们使用group_by()在每个组中运行它。
library(dplyr)
test %>%
group_by(grp) %>%
filter(all(c("E", "I", "C") %in% val))
# Source: local data frame [8 x 2]
# Groups: grp [2]
#
# grp val
# (dbl) (fctr)
# 1 2 E
# 2 2 I
# 3 2 C
# 4 4 C
# 5 4 I
# 6 4 E
# 7 4 E
# 8 4 A
Or if this output would be handier (thanks @Frank),
或者如果这个输出更方便(感谢@Frank),
test %>%
group_by(grp) %>%
summarise(matching = all(c("E", "I", "C") %in% val))
# Source: local data frame [4 x 2]
#
# grp matching
# (dbl) (lgl)
# 1 1 FALSE
# 2 2 TRUE
# 3 3 FALSE
# 4 4 TRUE
#2
2
library(data.table)
test <- data.frame(grp=c(1,1,2,2,2,3,3,3,4,4,4,4,4),val=c("C","I","E","I","C","E","I","A","C","I","E","E","A"))
setDT(test) # convert the data.frame into a data.table
group.counts <- dcast(test, grp ~ val) # count number of same values per group and create one column per val with the count in the cell
group.counts[I>0 & E>0 & C>0,] # now filtering is easy
Results in:
结果是:
grp A C E I
1: 2 0 1 1 1
2: 4 1 1 2 1
Instead of returning the group numbers only you could also "join" the resulting group numbers with the original data to show the "raw" data rows of each group that matches:
您可以将结果组号与原始数据“连接”,而不是仅返回组号,以显示匹配的每个组的“原始”数据行:
test[group.counts[I>0 & E>0 & C>0,], .SD, on="grp" ]
This shows:
由此可见:
grp val
1: 2 E
2: 2 I
3: 2 C
4: 4 C
5: 4 I
6: 4 E
7: 4 E
8: 4 A
PS: Just to understand the solution easier: The counts for all groups are:
PS:只是为了更容易理解解决方案:所有组的计数是:
> group.counts
grp A C E I
1: 1 0 1 0 1
2: 2 0 1 1 1
3: 3 1 0 1 1
4: 4 1 1 2 1