I have the following data frame:
我有以下数据框架:
> my.data
A.Seats B.Seats
1 14,15 14,15,16
2 7 7,8
3 12,13 16,17
4 <NA> 10,11
I would like to check if the string within any row in column "A.Seats" is found within the same row of column "B.Seats". So the output would look something like this:
我想检查A列中任意一行中的字符串。“座位”是在同一排“b”栏中找到的。输出是这样的
A.Seats B.Seats Check
1 14,15 14,15,16 TRUE
2 7 7,8 TRUE
3 12,13 16,17 FALSE
4 <NA> 10,11 FALSE
But I don't know how to create this table. As a start, I tried using grep:
但我不知道如何创建这个表。首先,我尝试使用grep:
grep(my.data$A.Seats,my.data$B.Seats)
But I receive the following output
但我收到如下输出
[1] 1
Warning message:
In grep(my.data$A.Seats, my.data$B.Seats) :
argument 'pattern' has length > 1 and only the first element will be used
...and I can't get past this error. Any ideas as to how I can get the intended result?
…我无法摆脱这个错误。对于如何得到预期的结果有什么想法吗?
Many Thanks
非常感谢
2 个解决方案
#1
1
The "stringi" library has some vectorized functions that might be useful for something like this. I would suggest the stri_detect()
function. Here's an example with some reproducible sample data. Note the difference in the values in the first and last row, and the difference in the results according to whether a regex
or fixed
approach was taken:
“stringi”库有一些向量化的函数,这些函数对于类似的东西可能有用。我建议使用stri_detect()函数。这里有一个示例,其中包含一些可重复的示例数据。注意第一行和最后一行的值之间的差异,以及根据采用regex还是固定方法而得到的结果的差异:
my.data <- data.frame(
A.Seats = c("14,15", "7", "12,13", NA, "14,19"),
B.Seats = c("14,15,16", "7,8", "16,17", "10,11", "14,15,16"))
my.data
# A.Seats B.Seats
# 1 14,15 14,15,16
# 2 7 7,8
# 3 12,13 16,17
# 4 <NA> 10,11
# 5 14,19 14,15,16
library(stringi)
stri_detect(my.data$B.Seats, fixed = my.data$A.Seats)
# [1] TRUE TRUE FALSE NA FALSE
stri_detect(my.data$B.Seats, regex = gsub(",", "|", my.data$A.Seats))
# [1] TRUE TRUE FALSE NA TRUE
The first option above treats the values in my.data$A.Seats
as a fixed string pattern. The second option treats it as a regular expression to match any of the values.
第一个选项处理my.data$A中的值。座椅作为一个固定的弦模式。第二个选项将它作为一个正则表达式来匹配任何值。
Note that this maintains NA
as NA
, but that can easily be changed to FALSE
if you need to.
注意,这将保持NA为NA,但如果需要,可以很容易地将其更改为FALSE。
If you don't want to think too much about mapply
, you can consider Vectorize
to make a vectorized version of grepl
. Something like the following should do it:
如果您不想过多地考虑mapply,可以考虑使用Vectorize制作一个向量化的grepl版本。下面这样的东西应该可以做到:
vGrepl <- Vectorize(grepl)
vGrepl(my.data$A.Seats, my.data$B.Seats) # pattern is fixed
# [1] 1 1 0 NA 0
vGrepl(gsub(",", "|", my.data$A.Seats), my.data$B.Seats) # pattern is regex
# 14|15 7 12|13 <NA> 14|19
# 1 1 0 NA 1
as.logical(vGrepl(my.data$A.Seats, my.data$B.Seats)) # coerce to logical
# [1] TRUE TRUE FALSE NA FALSE
Because this calls grepl
on each element in the vector, I don't think this will scale well though.
因为它在向量中的每个元素上都调用grepl,所以我认为它不能很好地伸缩。
#2
1
This is an approach to get what you need
这是一种获取所需的方法
> List <- lapply(my.data, function(x) strsplit(as.character(x), ","))
> transform(my.data, Check=sapply(mapply("%in%", List[[1]], List[[2]]), any))
A.Seats B.Seats Check
1 14,15 14,15,16 TRUE
2 7 7,8 TRUE
3 12,13 16,17 FALSE
4 <NA> 10,11 FALSE
Here's an alternative using grep
这里有一个使用grep的替代方法
>transform(my.data,
Check=sapply(suppressWarnings(mapply("grep", List[[1]], List[[2]])), any))
#1
1
The "stringi" library has some vectorized functions that might be useful for something like this. I would suggest the stri_detect()
function. Here's an example with some reproducible sample data. Note the difference in the values in the first and last row, and the difference in the results according to whether a regex
or fixed
approach was taken:
“stringi”库有一些向量化的函数,这些函数对于类似的东西可能有用。我建议使用stri_detect()函数。这里有一个示例,其中包含一些可重复的示例数据。注意第一行和最后一行的值之间的差异,以及根据采用regex还是固定方法而得到的结果的差异:
my.data <- data.frame(
A.Seats = c("14,15", "7", "12,13", NA, "14,19"),
B.Seats = c("14,15,16", "7,8", "16,17", "10,11", "14,15,16"))
my.data
# A.Seats B.Seats
# 1 14,15 14,15,16
# 2 7 7,8
# 3 12,13 16,17
# 4 <NA> 10,11
# 5 14,19 14,15,16
library(stringi)
stri_detect(my.data$B.Seats, fixed = my.data$A.Seats)
# [1] TRUE TRUE FALSE NA FALSE
stri_detect(my.data$B.Seats, regex = gsub(",", "|", my.data$A.Seats))
# [1] TRUE TRUE FALSE NA TRUE
The first option above treats the values in my.data$A.Seats
as a fixed string pattern. The second option treats it as a regular expression to match any of the values.
第一个选项处理my.data$A中的值。座椅作为一个固定的弦模式。第二个选项将它作为一个正则表达式来匹配任何值。
Note that this maintains NA
as NA
, but that can easily be changed to FALSE
if you need to.
注意,这将保持NA为NA,但如果需要,可以很容易地将其更改为FALSE。
If you don't want to think too much about mapply
, you can consider Vectorize
to make a vectorized version of grepl
. Something like the following should do it:
如果您不想过多地考虑mapply,可以考虑使用Vectorize制作一个向量化的grepl版本。下面这样的东西应该可以做到:
vGrepl <- Vectorize(grepl)
vGrepl(my.data$A.Seats, my.data$B.Seats) # pattern is fixed
# [1] 1 1 0 NA 0
vGrepl(gsub(",", "|", my.data$A.Seats), my.data$B.Seats) # pattern is regex
# 14|15 7 12|13 <NA> 14|19
# 1 1 0 NA 1
as.logical(vGrepl(my.data$A.Seats, my.data$B.Seats)) # coerce to logical
# [1] TRUE TRUE FALSE NA FALSE
Because this calls grepl
on each element in the vector, I don't think this will scale well though.
因为它在向量中的每个元素上都调用grepl,所以我认为它不能很好地伸缩。
#2
1
This is an approach to get what you need
这是一种获取所需的方法
> List <- lapply(my.data, function(x) strsplit(as.character(x), ","))
> transform(my.data, Check=sapply(mapply("%in%", List[[1]], List[[2]]), any))
A.Seats B.Seats Check
1 14,15 14,15,16 TRUE
2 7 7,8 TRUE
3 12,13 16,17 FALSE
4 <NA> 10,11 FALSE
Here's an alternative using grep
这里有一个使用grep的替代方法
>transform(my.data,
Check=sapply(suppressWarnings(mapply("grep", List[[1]], List[[2]])), any))