Possible Duplicate:
Extract info inside all parenthesis in R (regex)可能的重复:在R (regex)中的所有括号中提取信息
I have a string
我有一个字符串
df
df
Peoplesoft(id-1290)
I like to capture characters between the parentesis, for example. I like to get id-1290 from the above example.
例如,我喜欢捕捉父母之间的角色。我喜欢从上面的示例获得id-1290。
I used this:
我用这个:
x <- regexpr("\\((.*)\\)", df)
this is giving me numbers like
这给了我一些数字。
[1] 10
Is there an easy way to grab text between parentesis using regex in R?
有没有一种简单的方法可以在父母之间使用regex获取文本?
2 个解决方案
#1
29
I prefer to use gsub()
for this:
我更喜欢使用gsub():
gsub(".*\\((.*)\\).*", "\\1", df)
[1] "id-1290"
The regex works like this:
regex的工作方式如下:
- Find text inside the parentheses - not your real parentheses, but my extra set of parentheses, i.e.
(.*)
- 在圆括号内找到文本——不是真正的圆括号,而是我额外的圆括号集合,例如(.*)
- Return this as a back-reference,
\\1
- 返回这个作为一个反向引用,\1
In other words, substitute all text in the string with the back reference
换句话说,用后引用替换字符串中的所有文本
If you want to use regexp
rather than gsub
, then do this:
如果您想使用regexp而不是gsub,那么请这样做:
x <- regexpr("\\((.*)\\)", df)
x
[1] 11
attr(,"match.length")
[1] 9
attr(,"useBytes")
[1] TRUE
This returns a value of 11, i.e. the starting position of the found expression. And note the attribute match.length
that indicates how many characters were matched.
它返回值11,即找到表达式的起始位置。注意属性匹配。长度,表示匹配了多少字符。
You can extract this with attr
:
你可以用attr提取:
attr(x, "match.length")
[1] 9
And then use substring
to extract the characters:
然后使用子字符串提取字符:
substring(df, x+1, x+attr(x, "match.length")-2)
[1] "id-1290"
#2
3
Here is a slightly different way, using lookbehind/ahead:
这里有一个稍微不同的方法,使用lookbehind/ahead:
df <- "Peoplesoft(id-1290)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))
Difference with Andrie's answer is that this also works to extract multiple strings in brackets. e.g.:
与Andrie的答案不同的是,它还可以在括号中提取多个字符串。例如:
df <- "Peoplesoft(id-1290) blabla (foo)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))
Gives:
给:
[[1]]
[1] "id-1290" "foo"
#1
29
I prefer to use gsub()
for this:
我更喜欢使用gsub():
gsub(".*\\((.*)\\).*", "\\1", df)
[1] "id-1290"
The regex works like this:
regex的工作方式如下:
- Find text inside the parentheses - not your real parentheses, but my extra set of parentheses, i.e.
(.*)
- 在圆括号内找到文本——不是真正的圆括号,而是我额外的圆括号集合,例如(.*)
- Return this as a back-reference,
\\1
- 返回这个作为一个反向引用,\1
In other words, substitute all text in the string with the back reference
换句话说,用后引用替换字符串中的所有文本
If you want to use regexp
rather than gsub
, then do this:
如果您想使用regexp而不是gsub,那么请这样做:
x <- regexpr("\\((.*)\\)", df)
x
[1] 11
attr(,"match.length")
[1] 9
attr(,"useBytes")
[1] TRUE
This returns a value of 11, i.e. the starting position of the found expression. And note the attribute match.length
that indicates how many characters were matched.
它返回值11,即找到表达式的起始位置。注意属性匹配。长度,表示匹配了多少字符。
You can extract this with attr
:
你可以用attr提取:
attr(x, "match.length")
[1] 9
And then use substring
to extract the characters:
然后使用子字符串提取字符:
substring(df, x+1, x+attr(x, "match.length")-2)
[1] "id-1290"
#2
3
Here is a slightly different way, using lookbehind/ahead:
这里有一个稍微不同的方法,使用lookbehind/ahead:
df <- "Peoplesoft(id-1290)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))
Difference with Andrie's answer is that this also works to extract multiple strings in brackets. e.g.:
与Andrie的答案不同的是,它还可以在括号中提取多个字符串。例如:
df <- "Peoplesoft(id-1290) blabla (foo)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))
Gives:
给:
[[1]]
[1] "id-1290" "foo"