匹配范围与字符串中的正则表达式

时间:2022-09-13 16:31:59

I have a list of codes, which have a structure of a letter followed by two digits, and I would like to extract the codes that start with a certain letter and contain digits in a certain range. Let's say I have codes like these:

我有一个代码列表,其中包含一个字母结构,后跟两个数字,我想提取以某个字母开头并包含某个范围内的数字的代码。假设我有这样的代码:

A01
A03
A06
A12
A99
B01
C09

and I would like to extract A[01-12] so I end up with 4 codes. How do I do that in R? I looked around for the answer to this question but I couldn't find anything relevant. Thanks for help.

我想提取A [01-12]所以我最终得到了4个代码。我如何在R中做到这一点?我环顾四周寻找这个问题的答案,但我找不到任何相关内容。感谢帮助。

2 个解决方案

#1


1  

Here is a quick example of how you might split your identifier into two columns for filtering.

以下是如何将标识符拆分为两列进行过滤的快速示例。

library(tidyverse)
df <- data.frame(Code=c("A01",
                        "A03",
                        "A06",
                        "A12",
                        "A99",
                        "B01",
                        "C09"))

df <- df %>%
  mutate(Code1 = substr(Code,1,1)) %>%
  mutate(Code2= as.numeric(substr(Code,2,3)))

df %>%
  filter(Code1=="A" & Code2<=12)

It has the advantage of being flexible for filtering and you could remove the columns after filtering should you wish.

它具有过滤灵活的优点,您可以根据需要在过滤后删除列。

#2


1  

Another way could be:

另一种方式可能是:

string <- c("A01", "A03", "A06", "A12", "A99", "B01", "C09")

string[grepl("(?<=A)[0-1](?<!1)[1-9]|(?<=A)[0-1](?<=1)[1-2]", string, perl = TRUE)]

[1] "A01" "A03" "A06" "A12"

#1


1  

Here is a quick example of how you might split your identifier into two columns for filtering.

以下是如何将标识符拆分为两列进行过滤的快速示例。

library(tidyverse)
df <- data.frame(Code=c("A01",
                        "A03",
                        "A06",
                        "A12",
                        "A99",
                        "B01",
                        "C09"))

df <- df %>%
  mutate(Code1 = substr(Code,1,1)) %>%
  mutate(Code2= as.numeric(substr(Code,2,3)))

df %>%
  filter(Code1=="A" & Code2<=12)

It has the advantage of being flexible for filtering and you could remove the columns after filtering should you wish.

它具有过滤灵活的优点,您可以根据需要在过滤后删除列。

#2


1  

Another way could be:

另一种方式可能是:

string <- c("A01", "A03", "A06", "A12", "A99", "B01", "C09")

string[grepl("(?<=A)[0-1](?<!1)[1-9]|(?<=A)[0-1](?<=1)[1-2]", string, perl = TRUE)]

[1] "A01" "A03" "A06" "A12"