Given is a string vector vecA
:
给出一个弦向量纬卡:
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
Problem
I would like to arrive at the vecB
that would correspond to:
我想要到达vecB,它对应的是:
vecB <- c("Population 12 - 22",
"Population 90 over",
"population under 78",
"population 99 - 101",
"Population 12 - 54",
"Population 78 - 92")
Key characteristics
The vecB
has the following characteristics:
欧洲央行具有以下特点:
- After the first two digits space and dash and space are inserted (
-
) - 在插入前两位数空格和破折号和空格后(-)
- If the space exists only the dash (
-
) is inserted - 如果空间存在,则只插入破折号(-)
- For combinations like
underDigitDigit
only space is inserted:under DigitDigit
- 对于像数字不足这样的组合,只在数字以下插入空格
Attempts
I was thinking of making use of groups in gsub, on the lines:
我想利用gsub中的群组,在线条上:
gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)
but that does not work for all the cases:
但这并不适用于所有情况:
> t(t(gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)))
[,1]
[1,] "12"
[2,] "90"
[3,] "population under78"
[4,] "99"
[5,] "12"
[6,] "78"
t()
applied for the presentational purposes only; regex101 link.
t()仅适用于表示目的;regex101链接。
1 个解决方案
#1
2
Here is my suggestion - do it in two steps: 1) add the hyphen between the numbers first, and then 2) add the space between words "over"/"under" and the number:
这里是我的建议——分两个步骤来做:1)先在数字之间加上连字符,然后2)在单词“over”/“under”和数字之间加上空格:
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
v <- gsub("^([[:alpha:]]+[[:blank:]]+)([[:digit:]]{2})\\s*([[:digit:]])", "\\1\\2 - \\3", vecA)
gsub("^([[:alpha:]]+[[:blank:]]+)(?|(over|under)(\\d+)|(\\d+)(over|under))", "\\1\\2 \\3", v, perl=T)
Output of a code demo:
代码演示的输出:
[1] "Population 12 - 22" "Population 90 over" "population under 78"
[4] "population 99 - 101" "Population 12 - 54" "Population 78 - 92"
The second regex contains a branch reset pattern (?|...|...)
to keep the same group IDs in the alternative subpatterns, thus requires a perl=T
.
第二个regex包含一个分支重置模式(?|…|…),以便在可选子模式中保持相同的组id,因此需要perl=T。
#1
2
Here is my suggestion - do it in two steps: 1) add the hyphen between the numbers first, and then 2) add the space between words "over"/"under" and the number:
这里是我的建议——分两个步骤来做:1)先在数字之间加上连字符,然后2)在单词“over”/“under”和数字之间加上空格:
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
v <- gsub("^([[:alpha:]]+[[:blank:]]+)([[:digit:]]{2})\\s*([[:digit:]])", "\\1\\2 - \\3", vecA)
gsub("^([[:alpha:]]+[[:blank:]]+)(?|(over|under)(\\d+)|(\\d+)(over|under))", "\\1\\2 \\3", v, perl=T)
Output of a code demo:
代码演示的输出:
[1] "Population 12 - 22" "Population 90 over" "population under 78"
[4] "population 99 - 101" "Population 12 - 54" "Population 78 - 92"
The second regex contains a branch reset pattern (?|...|...)
to keep the same group IDs in the alternative subpatterns, thus requires a perl=T
.
第二个regex包含一个分支重置模式(?|…|…),以便在可选子模式中保持相同的组id,因此需要perl=T。