在括号[重复]之间选取一些文本

时间:2022-09-13 00:14:28

Possible Duplicate:
Extract info inside all parenthesis in R (regex)

可能的重复:在R (regex)中的所有括号中提取信息

I have a string

我有一个字符串

df

df

Peoplesoft(id-1290)

I like to capture characters between the parentesis, for example. I like to get id-1290 from the above example.

例如,我喜欢捕捉父母之间的角色。我喜欢从上面的示例获得id-1290。

I used this:

我用这个:

x <- regexpr("\\((.*)\\)", df) 

this is giving me numbers like

这给了我一些数字。

[1] 10

Is there an easy way to grab text between parentesis using regex in R?

有没有一种简单的方法可以在父母之间使用regex获取文本?

2 个解决方案

#1


29  

I prefer to use gsub() for this:

我更喜欢使用gsub():

gsub(".*\\((.*)\\).*", "\\1", df)
[1] "id-1290"

The regex works like this:

regex的工作方式如下:

  • Find text inside the parentheses - not your real parentheses, but my extra set of parentheses, i.e. (.*)
  • 在圆括号内找到文本——不是真正的圆括号,而是我额外的圆括号集合,例如(.*)
  • Return this as a back-reference, \\1
  • 返回这个作为一个反向引用,\1

In other words, substitute all text in the string with the back reference

换句话说,用后引用替换字符串中的所有文本


If you want to use regexp rather than gsub, then do this:

如果您想使用regexp而不是gsub,那么请这样做:

x <- regexpr("\\((.*)\\)", df)
x

[1] 11
attr(,"match.length")
[1] 9
attr(,"useBytes")
[1] TRUE

This returns a value of 11, i.e. the starting position of the found expression. And note the attribute match.length that indicates how many characters were matched.

它返回值11,即找到表达式的起始位置。注意属性匹配。长度,表示匹配了多少字符。

You can extract this with attr:

你可以用attr提取:

attr(x, "match.length")
[1] 9

And then use substring to extract the characters:

然后使用子字符串提取字符:

substring(df, x+1, x+attr(x, "match.length")-2)
[1] "id-1290"

#2


3  

Here is a slightly different way, using lookbehind/ahead:

这里有一个稍微不同的方法,使用lookbehind/ahead:

df <- "Peoplesoft(id-1290)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))

Difference with Andrie's answer is that this also works to extract multiple strings in brackets. e.g.:

与Andrie的答案不同的是,它还可以在括号中提取多个字符串。例如:

df <- "Peoplesoft(id-1290) blabla (foo)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))

Gives:

给:

[[1]]
[1] "id-1290" "foo" 

#1


29  

I prefer to use gsub() for this:

我更喜欢使用gsub():

gsub(".*\\((.*)\\).*", "\\1", df)
[1] "id-1290"

The regex works like this:

regex的工作方式如下:

  • Find text inside the parentheses - not your real parentheses, but my extra set of parentheses, i.e. (.*)
  • 在圆括号内找到文本——不是真正的圆括号,而是我额外的圆括号集合,例如(.*)
  • Return this as a back-reference, \\1
  • 返回这个作为一个反向引用,\1

In other words, substitute all text in the string with the back reference

换句话说,用后引用替换字符串中的所有文本


If you want to use regexp rather than gsub, then do this:

如果您想使用regexp而不是gsub,那么请这样做:

x <- regexpr("\\((.*)\\)", df)
x

[1] 11
attr(,"match.length")
[1] 9
attr(,"useBytes")
[1] TRUE

This returns a value of 11, i.e. the starting position of the found expression. And note the attribute match.length that indicates how many characters were matched.

它返回值11,即找到表达式的起始位置。注意属性匹配。长度,表示匹配了多少字符。

You can extract this with attr:

你可以用attr提取:

attr(x, "match.length")
[1] 9

And then use substring to extract the characters:

然后使用子字符串提取字符:

substring(df, x+1, x+attr(x, "match.length")-2)
[1] "id-1290"

#2


3  

Here is a slightly different way, using lookbehind/ahead:

这里有一个稍微不同的方法,使用lookbehind/ahead:

df <- "Peoplesoft(id-1290)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))

Difference with Andrie's answer is that this also works to extract multiple strings in brackets. e.g.:

与Andrie的答案不同的是,它还可以在括号中提取多个字符串。例如:

df <- "Peoplesoft(id-1290) blabla (foo)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))

Gives:

给:

[[1]]
[1] "id-1290" "foo"