Suppose I have a string vector:
假设有一个弦向量
header = c("2012 Chevrolet Camaro SS",
"2013 Chevrolet Equinox LT",
"2013 Nissan Altima 2.5 SV",
"2009 Infiniti M35x X")
and a list of car makers
还有汽车制造商的名单
maker.list = c("Chevrolet", "Nissan", "Infiniti")
I want to use agrep() to return the index of where the car maker appears in each element in the header. I want it to return
我想使用agrep()返回标题中每个元素中汽车制造商出现的位置的索引。我想让它回来。
idx = c(2, 2, 2, 2) #the makers' name occurs at the 2nd position of each element
Since the pattern is string list, I am thinking to use mapply or lapply to loop it over. Or maybe use r command to change the maker names to a regular expression like
由于模式是字符串列表,我正在考虑使用mapply或lapply对它进行循环。或者可以使用r命令将maker名称更改为正则表达式,比如
regexp = "Chevrolet|Nissan|Infiniti"
So far I have:
到目前为止我有:
idx = lapply(maker.list, function(permaker){
match.result = agrep(permaker, header, max.distance = 1)
return (match.result)
})
This obviously does not work... Any ideas?
这显然行不通……什么好主意吗?
-----------------------------update-------------------------------- I tried one of the solution below and something strange happened.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - update - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -我试着下面的解决方案之一,奇怪的事情发生了。
maker.list1 = c("zap", "ford")
lapply(maker.list1, agrep, c("2011" ,"Ford", "Escape"), max.distance = 1, ignore.case = TRUE)
the result is
结果是
[[1]]
[1] 3
[[2]]
[1] 2
which saying both matches, this makes no sense to me, am I missing something? ps: In my actual case, I have about 70 car makers and over 4k headers.
两种说法都对,这对我来说毫无意义,我是不是漏掉了什么?ps:在我的实际案例中,我有大约70家汽车制造商和超过4k的机身。
1 个解决方案
#1
2
strsplit
each item in your header
by spaces and then run agrep
through each one:
将每个项目在你的头中按空格分开,然后在每个项目中运行协议:
sapply(strsplit(header, "\\s+"), function(H) unlist(lapply(maker.list, agrep, H)) )
#[1] 2 2 2 2
If you get multiple hits for any case you will get a list
instead of a vector as the result.
如果你在任何情况下得到多次命中,你会得到一个列表而不是一个向量。
#1
2
strsplit
each item in your header
by spaces and then run agrep
through each one:
将每个项目在你的头中按空格分开,然后在每个项目中运行协议:
sapply(strsplit(header, "\\s+"), function(H) unlist(lapply(maker.list, agrep, H)) )
#[1] 2 2 2 2
If you get multiple hits for any case you will get a list
instead of a vector as the result.
如果你在任何情况下得到多次命中,你会得到一个列表而不是一个向量。