I match and replace 4-digit numbers preceded and followed by white space with:
我将4位数的数字加在前面,然后在空格后面加上:
str12 <- "coihr 1234 &/()= jngm 34 ljd"
sub("\\s\\d{4}\\s", "", str12)
[1] "coihr&/()= jngm 34 ljd"
but, every try to invert this and extract the number instead fails. I want:
但是,每一个尝试将其转化并提取出数字的方法都失败了。我想要:
[1] 1234
does someone has a clue?
有人知道吗?
ps: I know how to do it with {stringr} but am wondering if it's possible with {base} only..
ps:我知道如何用{stringr}来做,但是我想知道是否可以只用{base}。
require(stringr)
gsub("\\s", "", str_extract(str12, "\\s\\d{4}\\s"))
[1] "1234"
3 个解决方案
#1
5
It's possible to capture group in regex using ()
. Taking the same example
可以使用()在regex中捕获组。以同样的例
str12 <- "coihr 1234 &/()= jngm 34 ljd"
gsub(".*\\s(\\d{4})\\s.*", "\\1", str12)
[1] "1234"
#2
6
regmatches()
, only available since R-2.14.0, allows you to "extract or replace matched substrings from match data obtained by regexpr
, gregexpr
or regexec
"
regmatches()仅从R-2.14.0开始提供,允许您“从regexpr、gregexpr或regexec获得的匹配数据中提取或替换匹配的子字符串”
Here are examples of how you could use regmatches()
to extract either the first whitespace-cushioned 4-digit substring in your input character string, or all such substrings.
这里有一些示例,说明如何使用regmatches()提取输入字符串中第一个白色缓冲的4位子字符串,或者所有此类子字符串。
## Example strings and pattern
x <- "coihr 1234 &/()= jngm 34 ljd" # string with 1 matching substring
xx <- "coihr 1234 &/()= jngm 3444 6789 ljd" # string with >1 matching substring
pat <- "(?<=\\s)(\\d{4})(?=\\s)"
## Use regexpr() to extract *1st* matching substring
as.numeric(regmatches(x, regexpr(pat, x, perl=TRUE)))
# [1] 1234
as.numeric(regmatches(xx, regexpr(pat, xx, perl=TRUE)))
# [1] 1234
## Use gregexpr() to extract *all* matching substrings
as.numeric(regmatches(xx, gregexpr(pat, xx, perl=TRUE))[[1]])
# [1] 1234 3444 6789
(Note that this will return numeric(0)
for character strings not containing a substring matching your criteria).
(注意,对于不包含与您的条件匹配的子字符串的字符字符串,这将返回numeric(0))。
#3
0
I'm pretty naive about regex in general, but here's an ugly way to do it in base:
总的来说,我对regex很天真,但这里有一个丑陋的方法:
# if it's always in the same spot as in your example
unlist(strsplit(str12, split = " "))[2]
# or if it can occur in various places
str13 <- unlist(strsplit(str12, split = " "))
str13[!is.na(as.integer(str13)) & nchar(str13) == 4] # issues warning
#1
5
It's possible to capture group in regex using ()
. Taking the same example
可以使用()在regex中捕获组。以同样的例
str12 <- "coihr 1234 &/()= jngm 34 ljd"
gsub(".*\\s(\\d{4})\\s.*", "\\1", str12)
[1] "1234"
#2
6
regmatches()
, only available since R-2.14.0, allows you to "extract or replace matched substrings from match data obtained by regexpr
, gregexpr
or regexec
"
regmatches()仅从R-2.14.0开始提供,允许您“从regexpr、gregexpr或regexec获得的匹配数据中提取或替换匹配的子字符串”
Here are examples of how you could use regmatches()
to extract either the first whitespace-cushioned 4-digit substring in your input character string, or all such substrings.
这里有一些示例,说明如何使用regmatches()提取输入字符串中第一个白色缓冲的4位子字符串,或者所有此类子字符串。
## Example strings and pattern
x <- "coihr 1234 &/()= jngm 34 ljd" # string with 1 matching substring
xx <- "coihr 1234 &/()= jngm 3444 6789 ljd" # string with >1 matching substring
pat <- "(?<=\\s)(\\d{4})(?=\\s)"
## Use regexpr() to extract *1st* matching substring
as.numeric(regmatches(x, regexpr(pat, x, perl=TRUE)))
# [1] 1234
as.numeric(regmatches(xx, regexpr(pat, xx, perl=TRUE)))
# [1] 1234
## Use gregexpr() to extract *all* matching substrings
as.numeric(regmatches(xx, gregexpr(pat, xx, perl=TRUE))[[1]])
# [1] 1234 3444 6789
(Note that this will return numeric(0)
for character strings not containing a substring matching your criteria).
(注意,对于不包含与您的条件匹配的子字符串的字符字符串,这将返回numeric(0))。
#3
0
I'm pretty naive about regex in general, but here's an ugly way to do it in base:
总的来说,我对regex很天真,但这里有一个丑陋的方法:
# if it's always in the same spot as in your example
unlist(strsplit(str12, split = " "))[2]
# or if it can occur in various places
str13 <- unlist(strsplit(str12, split = " "))
str13[!is.na(as.integer(str13)) & nchar(str13) == 4] # issues warning