Col
WBU-ARGU*06:03:04
WBU-ARDU*08:01:01
WBU-ARFU*11:03:05
WBU-ARFU*03:456
I have a column which has 75 rows of variables such as the col above. I am not quite sure how to use gsub or sub in order to get up until the integers after the first colon.
我有一列有75行变量,如上面的col。我不太确定如何使用gsub或sub来起床直到第一次冒号后的整数。
Expected output:
Col
WBU-ARGU*06:03
WBU-ARDU*08:01
WBU-ARFU*11:03
WBU-ARFU*03:456
I tried this but it doesn't seem to work:
我试过这个,但它似乎不起作用:
gsub("*..:","", df$col)
3 个解决方案
#1
3
Following may help you here too.
以下也可以帮到你。
sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)
Output will be as follows.
输出如下。
> sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)
[1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456b"
Where Input for data frame is as follows.
数据帧的输入如下。
dat <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456b")
df <- data.frame(dat)
Explanation: Following is only for explanation purposes.
说明:以下仅用于说明目的。
sub(" ##using sub for global subtitution function of R here.
([^:]*) ##By mentioning () we are keeping the matched values from vector's element into 1st place of memory(which we could use later), which is till next colon comes it will match everything.
: ##Mentioning letter colon(:) here.
([^:]*) ##By mentioning () making 2nd place in memory for matched values in vector's values which is till next colon comes it will match everything.
.*" ##Mentioning .* to match everything else now after 2nd colon comes in value.
,"\\1:\\2" ##Now mentioning the values of memory holds with whom we want to substitute the element values \\1 means 1st memory place \\2 is second memory place's value.
,df$dat) ##Mentioning df$dat dataframe's dat value.
#2
2
You may use
你可以用
df$col <- sub("(\\d:\\d+):\\d+$", "\\1", df$col)
See the regex demo
请参阅正则表达式演示
Details
-
(\\d:\\d+)
- Capturing group 1 (its value will be accessible via\1
in the replacement pattern): a digit, a colon and 1+ digits. -
:
- a colon -
\\d+
- 1+ digits -
$
- end of string.
(\\ d:\\ d +) - 捕获组1(其值可通过替换模式中的\ 1访问):数字,冒号和1+位数。
: - 冒号
\\ d + - 1+位数
$ - 结束字符串。
col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("(\\d:\\d+):\\d+$", "\\1", col)
## => [1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
Alternative approach:
df$col <- sub("^(.*?:\\d+).*", "\\1", df$col)
See the regex demo
请参阅正则表达式演示
Here,
-
^
- start of string -
(.*?:\\d+)
- Group 1: any 0+ chars, as few as possible (due to the lazy*?
quantifier), then:
and 1+ digits -
.*
- the rest of the string.
^ - 字符串的开头
(。*?:\\ d +) - 第1组:任何0+字符,尽可能少(由于懒惰的*?量词),然后:和1+位数
。* - 字符串的其余部分。
However, it should be used with the PCRE regex engine, pass perl=TRUE
:
但是,它应该与PCRE正则表达式引擎一起使用,传递perl = TRUE:
col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("^(.*?:\\d+).*", "\\1", col, perl=TRUE)
## => [1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
See the R online demo.
请参阅R在线演示。
#3
1
sub("(\\d+:\\d+):\\d+$", "\\1", df$Col)
[1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
Alternatively match what you want (instead of subbing out what you don't want) with stringi
:
或者用stringi匹配你想要的东西(而不是你不想要的东西):
stringi::stri_extract_first(df$Col, regex = "[A-Z-\\*]+\\d+:\\d+")
Slightly more concise stringr
:
更简洁的字符串:
stringr::str_extract(df$Col, "[A-Z-\\*]+\\d+:\\d+")
# or
stringr::str_extract(df$Col, "[\\w-*]+\\d+:\\d+")
#1
3
Following may help you here too.
以下也可以帮到你。
sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)
Output will be as follows.
输出如下。
> sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)
[1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456b"
Where Input for data frame is as follows.
数据帧的输入如下。
dat <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456b")
df <- data.frame(dat)
Explanation: Following is only for explanation purposes.
说明:以下仅用于说明目的。
sub(" ##using sub for global subtitution function of R here.
([^:]*) ##By mentioning () we are keeping the matched values from vector's element into 1st place of memory(which we could use later), which is till next colon comes it will match everything.
: ##Mentioning letter colon(:) here.
([^:]*) ##By mentioning () making 2nd place in memory for matched values in vector's values which is till next colon comes it will match everything.
.*" ##Mentioning .* to match everything else now after 2nd colon comes in value.
,"\\1:\\2" ##Now mentioning the values of memory holds with whom we want to substitute the element values \\1 means 1st memory place \\2 is second memory place's value.
,df$dat) ##Mentioning df$dat dataframe's dat value.
#2
2
You may use
你可以用
df$col <- sub("(\\d:\\d+):\\d+$", "\\1", df$col)
See the regex demo
请参阅正则表达式演示
Details
-
(\\d:\\d+)
- Capturing group 1 (its value will be accessible via\1
in the replacement pattern): a digit, a colon and 1+ digits. -
:
- a colon -
\\d+
- 1+ digits -
$
- end of string.
(\\ d:\\ d +) - 捕获组1(其值可通过替换模式中的\ 1访问):数字,冒号和1+位数。
: - 冒号
\\ d + - 1+位数
$ - 结束字符串。
col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("(\\d:\\d+):\\d+$", "\\1", col)
## => [1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
Alternative approach:
df$col <- sub("^(.*?:\\d+).*", "\\1", df$col)
See the regex demo
请参阅正则表达式演示
Here,
-
^
- start of string -
(.*?:\\d+)
- Group 1: any 0+ chars, as few as possible (due to the lazy*?
quantifier), then:
and 1+ digits -
.*
- the rest of the string.
^ - 字符串的开头
(。*?:\\ d +) - 第1组:任何0+字符,尽可能少(由于懒惰的*?量词),然后:和1+位数
。* - 字符串的其余部分。
However, it should be used with the PCRE regex engine, pass perl=TRUE
:
但是,它应该与PCRE正则表达式引擎一起使用,传递perl = TRUE:
col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("^(.*?:\\d+).*", "\\1", col, perl=TRUE)
## => [1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
See the R online demo.
请参阅R在线演示。
#3
1
sub("(\\d+:\\d+):\\d+$", "\\1", df$Col)
[1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
Alternatively match what you want (instead of subbing out what you don't want) with stringi
:
或者用stringi匹配你想要的东西(而不是你不想要的东西):
stringi::stri_extract_first(df$Col, regex = "[A-Z-\\*]+\\d+:\\d+")
Slightly more concise stringr
:
更简洁的字符串:
stringr::str_extract(df$Col, "[A-Z-\\*]+\\d+:\\d+")
# or
stringr::str_extract(df$Col, "[\\w-*]+\\d+:\\d+")