如何使用regex在R中找到给定字符串旁边的模式

时间:2022-05-05 19:20:41

I have a string formatted for example like "segmentation_level1_id_10" and would like to extract the level number associated to it (i.e. the number directly after the word level).

我有一个字符串格式,例如“segmentation_level1_id_10”,并希望提取与它相关联的级别数(即在单词级别后直接编号)。

I have a solution that does this in two steps, first finds the pattern level\\d+ then replaces the level with missing after, but I would like to know if it's possible to do this in one step just with str_extract

我有一个解决方案,可以在两个步骤中完成,首先找到模式级别的\\d+,然后替换掉后面缺少的级别,但是我想知道是否有可能在一个步骤中使用str_extract。

Example below:

在下面的例子:

library(stringr)

segmentation_id <- "segmentation_level1_id_10"

segmentation_level <- str_replace(str_extract(segmentation_id, "level\\d+"), "level", "")

1 个解决方案

#1


4  

One way to do it is by using a stringr library str_extract function with a regex featuring a lookbehind:

一种方法是使用stringr库str_extract函数,并使用具有lookbehind特征的regex:

> library(stringr)
> s = "segmentation_level1_id_10"
> str_extract(s, "(?<=level)\\d+")
## or to make sure we match the level after _: str_extract(s, "(?<=_level)\\d+")
[1] "1"

Or using str_match that allows extracting captured group texts:

或者使用str_match提取捕获的组文本:

> str_match(s, "_level(\\d+)")[,2]
[1] "1"

It can be done with base R using the gsub and making use of the same capturing mechanism used in str_match, but also using a backreference to restore the captured text in the replacement result:

可以使用gsub使用base R,使用str_match中使用的捕获机制,也可以使用backreference恢复替换结果中捕获的文本:

> gsub("^.*level(\\d+).*", "\\1", s)
[1] "1"

#1


4  

One way to do it is by using a stringr library str_extract function with a regex featuring a lookbehind:

一种方法是使用stringr库str_extract函数,并使用具有lookbehind特征的regex:

> library(stringr)
> s = "segmentation_level1_id_10"
> str_extract(s, "(?<=level)\\d+")
## or to make sure we match the level after _: str_extract(s, "(?<=_level)\\d+")
[1] "1"

Or using str_match that allows extracting captured group texts:

或者使用str_match提取捕获的组文本:

> str_match(s, "_level(\\d+)")[,2]
[1] "1"

It can be done with base R using the gsub and making use of the same capturing mechanism used in str_match, but also using a backreference to restore the captured text in the replacement result:

可以使用gsub使用base R,使用str_match中使用的捕获机制,也可以使用backreference恢复替换结果中捕获的文本:

> gsub("^.*level(\\d+).*", "\\1", s)
[1] "1"