使用正则表达式从字符串中提取

时间:2021-10-12 21:17:59

I have a string:

我有一个字符串:

s <- "test.test AS field1, ablh.blah AS field2, faslk.lsdf AS field3"

I want to convert to:

我想转换为:

"field1, field2, field3"

I know that the regular expression (\w+)(?:,|$) will extract the strings I want ('field1,' etc) but I can't figure out how to extract it with gsub.

我知道正则表达式(\ w +)(?:,| $)将提取我想要的字符串('field1,'等),但我无法弄清楚如何用gsub提取它。

2 个解决方案

#1


10  

## Preparation
s <- "test.test AS field1, ablh.blah AS field2, faslk.lsdf AS field3"
pat <- "(\\w+)(?:,|$)"  ## Note the doubly-escaped \\w

## Use the powerful gregexpr/regmatches one-two punch
m <- gregexpr(pat, s)
paste(regmatches(s, m)[[1]], collapse=" ")
# [1] "field1, field2, field3"

#2


0  

With strapplyc in the gsubfn package one can do it with a particularly simple regular expression which extracts each string of word characters that follows " AS " (If the field can contain non-word characters then replace \\w with the appropriate expression, for example any char that is not a space or comma: [^ ,]):

使用gsubfn包中的strapplyc,可以使用一个特别简单的正则表达式来完成它,该正则表达式提取“AS”后面的每个单词字符串(如果该字段可以包含非单词字符,则用适当的表达式替换\\ w,例如任何不是空格或逗号的字符:[^,]):

> library(gsubfn)
> strapplyc(s, " AS (\\w+)", simplify = toString)[[1]]
[1] "field1, field2, field3"

#1


10  

## Preparation
s <- "test.test AS field1, ablh.blah AS field2, faslk.lsdf AS field3"
pat <- "(\\w+)(?:,|$)"  ## Note the doubly-escaped \\w

## Use the powerful gregexpr/regmatches one-two punch
m <- gregexpr(pat, s)
paste(regmatches(s, m)[[1]], collapse=" ")
# [1] "field1, field2, field3"

#2


0  

With strapplyc in the gsubfn package one can do it with a particularly simple regular expression which extracts each string of word characters that follows " AS " (If the field can contain non-word characters then replace \\w with the appropriate expression, for example any char that is not a space or comma: [^ ,]):

使用gsubfn包中的strapplyc,可以使用一个特别简单的正则表达式来完成它,该正则表达式提取“AS”后面的每个单词字符串(如果该字段可以包含非单词字符,则用适当的表达式替换\\ w,例如任何不是空格或逗号的字符:[^,]):

> library(gsubfn)
> strapplyc(s, " AS (\\w+)", simplify = toString)[[1]]
[1] "field1, field2, field3"