I have the following string : "PRODUCT colgate good but not goodOKAY"
我有以下字符串:“高露洁产品不错,但不是goodOKAY”
I want to extract all the words between PRODUCT
and OKAY
我想把所有的单词都提取出来。
4 个解决方案
#1
16
This can be done with sub
:
这可以用sub:
s <- "PRODUCT colgate good but not goodOKAY"
sub(".*PRODUCT *(.*?) *OKAY.*", "\\1", s)
giving:
给:
[1] "colgate good but not good"
No packages are needed.
不需要包。
Here is a visualization of the regular expression:
下面是正则表达式的可视化:
.*PRODUCT *(.*?) *OKAY.*
Debuggex演示
#2
12
You can use gsub
:
您可以使用gsub:
vec <- "PRODUCT colgate good but not goodOKAY"
gsub(".*PRODUCT\\s*|OKAY.*", "", vec)
# [1] "colgate good but not good"
#3
11
x = "PRODUCT colgate good but not goodOKAY"
library(stringr)
str_extract(string = x, pattern = perl("(?<=PRODUCT).*(?=OKAY)"))
(?<=PRODUCT)
-- look behind the match for PRODUCT
(?<=PRODUCT)——查找匹配的产品
.*
match everything except new lines.
.*除了换行外,其他都匹配。
(?=OKAY)
-- look ahead to match OKAY
.
(?=好的)——向前看,好。
I should add you don't need the stingr
package for this, the base functions sub
and gsub
work fine. I use stringr for it's consistency of syntax: whether I'm extracting, replacing, detecting etc. the function names are predictable and understandable, and the arguments are in a consistent order. I use stringr
because it saves me from needing the documentation every time.
我应该补充一点,你不需要这个吝啬鬼包,base function sub和gsub都很好。我使用stringr表示语法的一致性:是否提取、替换、检测等等,函数名是可以预测和理解的,参数是一致的。我使用stringr是因为它避免了我每次都需要文档。
#4
7
You could use the rm_between
function from the qdapRegex package. It takes a string and a left and right boundary as follows:
您可以使用qdapRegex包中的rm_between函数。它取一个字符串和一个左右边界如下:
x <- "PRODUCT colgate good but not goodOKAY"
library(qdapRegex)
rm_between(x, "PRODUCT", "OKAY", extract=TRUE)
## [[1]]
## [1] "colgate good but not good"
#1
16
This can be done with sub
:
这可以用sub:
s <- "PRODUCT colgate good but not goodOKAY"
sub(".*PRODUCT *(.*?) *OKAY.*", "\\1", s)
giving:
给:
[1] "colgate good but not good"
No packages are needed.
不需要包。
Here is a visualization of the regular expression:
下面是正则表达式的可视化:
.*PRODUCT *(.*?) *OKAY.*
Debuggex演示
#2
12
You can use gsub
:
您可以使用gsub:
vec <- "PRODUCT colgate good but not goodOKAY"
gsub(".*PRODUCT\\s*|OKAY.*", "", vec)
# [1] "colgate good but not good"
#3
11
x = "PRODUCT colgate good but not goodOKAY"
library(stringr)
str_extract(string = x, pattern = perl("(?<=PRODUCT).*(?=OKAY)"))
(?<=PRODUCT)
-- look behind the match for PRODUCT
(?<=PRODUCT)——查找匹配的产品
.*
match everything except new lines.
.*除了换行外,其他都匹配。
(?=OKAY)
-- look ahead to match OKAY
.
(?=好的)——向前看,好。
I should add you don't need the stingr
package for this, the base functions sub
and gsub
work fine. I use stringr for it's consistency of syntax: whether I'm extracting, replacing, detecting etc. the function names are predictable and understandable, and the arguments are in a consistent order. I use stringr
because it saves me from needing the documentation every time.
我应该补充一点,你不需要这个吝啬鬼包,base function sub和gsub都很好。我使用stringr表示语法的一致性:是否提取、替换、检测等等,函数名是可以预测和理解的,参数是一致的。我使用stringr是因为它避免了我每次都需要文档。
#4
7
You could use the rm_between
function from the qdapRegex package. It takes a string and a left and right boundary as follows:
您可以使用qdapRegex包中的rm_between函数。它取一个字符串和一个左右边界如下:
x <- "PRODUCT colgate good but not goodOKAY"
library(qdapRegex)
rm_between(x, "PRODUCT", "OKAY", extract=TRUE)
## [[1]]
## [1] "colgate good but not good"