使用向量中的元素进行正则表达式

时间:2022-06-21 15:42:10

Even at the risk of being this question labeled as duplicated, I am going to ask since all the related questions I have checked do not solve my problem...

即使冒着被贴上重复标签的风险,我还是会问,因为我检查过的所有相关问题都不能解决我的问题……

I have a labs vector and I want to find the elements that are exact matches to 3 groups stored in a groups variable.

我有一个lab向量,我想找到与存储在group变量中的三个组完全匹配的元素。

set.seed(1)
labs <- sample(c(rep('BC-89HX',3), rep('BC-89HX with 2% Puricare + 5% Merquat',3), rep('Own SH',4)), 10)
labs
groups <- c('BC-89HX','BC-89HX with 2% Puricare + 5% Merquat','Own SH')

I want to identify the "BC-89HX" group elements (not the "BC-89HX with 2% Puricare + 5% Merquat" ones)

我想识别“BC-89HX”组元素(而不是“BC-89HX, 2%纯化+ 5%美喹特”)

grep(groups[1], labs, val=TRUE, fixed=TRUE) #finds more elements than the ones I need
grep(paste(groups[1],"$",sep=""), labs, val=TRUE, fixed=TRUE) #does not work
grep(paste("\\b",groups[1],"\\b",sep=""), labs, val=TRUE, fixed=TRUE) #does not work

Any help?

任何帮助吗?

1 个解决方案

#1


2  

The solution to be make sure that "BC-89HX" is the only characters in the string and by pasteing ^ and $ we identify the starting and end position

解决方案是确保“bc - 89 hx”是唯一的字符在字符串和粘贴^和$我们确定起始和结束位置

grep(paste0("^", groups[1], "$"), labs, value=TRUE) 
#[1] "BC-89HX" "BC-89HX" "BC-89HX"

In this case, we cannot use the fixed = TRUE as ^ and $ are metacharacters which imply the start and end location. If we do fixed = TRUE, it will parse it as literal characters which the 'labs' doesn't have

在这种情况下,我们不能使用固定= TRUE ^和$元字符,意味着开始和结束的位置。如果我们做fixed = TRUE,它会将它解析为文字字符,这是“实验室”没有的

Another option is to use == or %in% as we are comparing fixed strings instead of matching substring in a string

另一种选择是在比较固定字符串时使用==或%,而不是在字符串中匹配子字符串

labs[labs == groups[1]]
#[1] "BC-89HX" "BC-89HX" "BC-89HX"

labs[labs == groups[2]]
#[1] "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat"

Update

If we really wanted to use grep with fixed = TRUE, then one way is to paste in both the pattern and the strings with the same characters i.e.

如果我们真的想用fixed = TRUE来使用grep,那么一种方法就是在模式和字符串中粘贴相同的字符。

labs[grep(paste0("^", groups[2], "$"), paste0("^", labs, "$"), fixed = TRUE) ]
#[1] "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat"
labs[grep(paste0("^", groups[1], "$"), paste0("^", labs, "$"), fixed = TRUE) ]
#[1] "BC-89HX" "BC-89HX" "BC-89HX"

#1


2  

The solution to be make sure that "BC-89HX" is the only characters in the string and by pasteing ^ and $ we identify the starting and end position

解决方案是确保“bc - 89 hx”是唯一的字符在字符串和粘贴^和$我们确定起始和结束位置

grep(paste0("^", groups[1], "$"), labs, value=TRUE) 
#[1] "BC-89HX" "BC-89HX" "BC-89HX"

In this case, we cannot use the fixed = TRUE as ^ and $ are metacharacters which imply the start and end location. If we do fixed = TRUE, it will parse it as literal characters which the 'labs' doesn't have

在这种情况下,我们不能使用固定= TRUE ^和$元字符,意味着开始和结束的位置。如果我们做fixed = TRUE,它会将它解析为文字字符,这是“实验室”没有的

Another option is to use == or %in% as we are comparing fixed strings instead of matching substring in a string

另一种选择是在比较固定字符串时使用==或%,而不是在字符串中匹配子字符串

labs[labs == groups[1]]
#[1] "BC-89HX" "BC-89HX" "BC-89HX"

labs[labs == groups[2]]
#[1] "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat"

Update

If we really wanted to use grep with fixed = TRUE, then one way is to paste in both the pattern and the strings with the same characters i.e.

如果我们真的想用fixed = TRUE来使用grep,那么一种方法就是在模式和字符串中粘贴相同的字符。

labs[grep(paste0("^", groups[2], "$"), paste0("^", labs, "$"), fixed = TRUE) ]
#[1] "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat" "BC-89HX with 2% Puricare + 5% Merquat"
labs[grep(paste0("^", groups[1], "$"), paste0("^", labs, "$"), fixed = TRUE) ]
#[1] "BC-89HX" "BC-89HX" "BC-89HX"