从包含R [duplicate]中特定字符的字符串向量中删除条目

This question already has an answer here:

这个问题在这里已有答案:

Difference of two character vectors with substring 3 answers

子串3的两个字符向量的差异答案

I've got two character vectors:

我有两个字符向量:

x = {"a", "b", "c", "kt"}
y = {"abs", "kot", "ccf", "okt", "kk", "y"}

I need to use x to remove entries from y so that only the strings that do not contain any of the x's entries remain, like this:

我需要使用x从y中删除条目,以便只保留不包含任何x条目的字符串,如下所示:

y = {"kot", "kk", "y"}

The code should work for any size of vectors x and y.

代码应适用于任何大小的向量x和y。

So far I've tried to use gsub and grepl but these only work with single strings. I've tried to create a loop to do this but the problem seems more difficult than I thought. And of course, the more sophisticated the solution is, the better, but you can assume that in this case the vectors x and y have up to 200 entries.

到目前为止,我已经尝试使用gsub和grepl,但这些只适用于单个字符串。我试图创建一个循环来做到这一点,但问题似乎比我想象的更难。当然,解决方案越复杂越好,但您可以假设在这种情况下,向量x和y最多有200个条目。

3 个解决方案

#1

We can use grep to find out which values in y match the pattern in x and exclude them using !%in%

我们可以使用grep找出y中哪些值与x中的模式匹配,并使用!%in%排除它们

y[!y %in% grep(paste0(x, collapse = "|"), y, value = T)]

#[1] "kot" "kk"  "y"

Or even better with grepl as it returns boolean vectors

甚至更好用grepl,因为它返回布尔向量

y[!grepl(paste0(x, collapse = "|"), y)]

#2

The answer given by @Ronak looks preferable to mine, but one option is to use sapply with grepl to get a matrix of matches against y, for each entry in x, then to roll that up with another call to apply.

@Ronak给出的答案看起来比我的更好,但是一个选项是使用grepl与grepl一起获得与y的匹配矩阵,对于x中的每个条目,然后通过另一个要应用的调用将其向上滚动。

> y[!apply(sapply(x, function(q) {grepl(q, y)}), 1, function(x) {sum(as.numeric(x)) > 0})]
[1] "kot" "kk"  "y"

Here is what I mean by matrix of matches:

这是我的匹配矩阵的意思:

> sapply(x, function(q) { grepl(q, y) })
         a     b     c    kt
[1,]  TRUE  TRUE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE
[3,] FALSE FALSE  TRUE FALSE
[4,] FALSE FALSE FALSE  TRUE
[5,] FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE
       ^^^^ each column is a match result for each element of x

#3

This should also work:

这应该也有效:

y[Reduce("+", lapply(x, grepl, y, fixed=TRUE))==0]
# [1] "kot" "kk"  "y"

#1