如果模式是字符串字符,如何查找匹配?

时间:2021-11-17 19:16:57

Suppose I have a string vector:

假设有一个弦向量

header = c("2012 Chevrolet Camaro SS", 
           "2013 Chevrolet Equinox LT", 
           "2013 Nissan Altima 2.5 SV", 
           "2009 Infiniti M35x X")

and a list of car makers

还有汽车制造商的名单

maker.list = c("Chevrolet", "Nissan", "Infiniti")

I want to use agrep() to return the index of where the car maker appears in each element in the header. I want it to return

我想使用agrep()返回标题中每个元素中汽车制造商出现的位置的索引。我想让它回来。

idx = c(2, 2, 2, 2) #the makers' name occurs at the 2nd position of each element 

Since the pattern is string list, I am thinking to use mapply or lapply to loop it over. Or maybe use r command to change the maker names to a regular expression like

由于模式是字符串列表,我正在考虑使用mapply或lapply对它进行循环。或者可以使用r命令将maker名称更改为正则表达式,比如

regexp = "Chevrolet|Nissan|Infiniti" 

So far I have:

到目前为止我有:

idx = lapply(maker.list, function(permaker){
   match.result = agrep(permaker, header, max.distance = 1)
   return (match.result)
})

This obviously does not work... Any ideas?

这显然行不通……什么好主意吗?

-----------------------------update-------------------------------- I tried one of the solution below and something strange happened.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - update - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -我试着下面的解决方案之一,奇怪的事情发生了。

maker.list1 = c("zap", "ford")
lapply(maker.list1, agrep, c("2011" ,"Ford", "Escape"), max.distance = 1, ignore.case = TRUE)

the result is

结果是

[[1]]
[1] 3

[[2]]
[1] 2

which saying both matches, this makes no sense to me, am I missing something? ps: In my actual case, I have about 70 car makers and over 4k headers.

两种说法都对,这对我来说毫无意义,我是不是漏掉了什么?ps:在我的实际案例中,我有大约70家汽车制造商和超过4k的机身。

1 个解决方案

#1


2  

strsplit each item in your header by spaces and then run agrep through each one:

将每个项目在你的头中按空格分开,然后在每个项目中运行协议:

sapply(strsplit(header, "\\s+"), function(H) unlist(lapply(maker.list, agrep, H)) )
#[1] 2 2 2 2

If you get multiple hits for any case you will get a list instead of a vector as the result.

如果你在任何情况下得到多次命中,你会得到一个列表而不是一个向量。

#1


2  

strsplit each item in your header by spaces and then run agrep through each one:

将每个项目在你的头中按空格分开,然后在每个项目中运行协议:

sapply(strsplit(header, "\\s+"), function(H) unlist(lapply(maker.list, agrep, H)) )
#[1] 2 2 2 2

If you get multiple hits for any case you will get a list instead of a vector as the result.

如果你在任何情况下得到多次命中,你会得到一个列表而不是一个向量。