I have this mystring
with the delimiter "_"
. The condition here is if there are two or more delimiters, I want to split at the second delimiter and if there is only one delimiter, I want to split at ".Recal"
and get the result
as shown below.
我有这个带分隔符“_”的mystring。这里的条件是,如果有两个或更多的分隔符,我想在第二个分隔符处拆分,如果只有一个分隔符,我想要分割。并得到如下所示的结果。
mystring<-c("MODY_60.2.ReCal.sort.bam","MODY_116.21_C4U.ReCal.sort.bam","MODY_116.3_C2RX-1-10.ReCal.sort.bam","MODY_116.4.ReCal.sort.bam")
result
结果
"MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
6 个解决方案
#1
11
You can do this using gsubfn
您可以使用gsubfn来实现这一点
library(gsubfn)
f <- function(x,y,z) if (z=="_") y else strsplit(x, ".ReCal", fixed=T)[[1]][[1]]
gsubfn("([^_]+_[^_]+)(.).*", f, mystring, backref=2)
# [1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
This allows for cases when you have more than two "_", and you want to split on the second one, for example,
这允许当你有两个以上的“_”,而你想在第二个“_”上分裂时,例如,
mystring<-c("MODY_60.2.ReCal.sort.bam",
"MODY_116.21_C4U.ReCal.sort.bam",
"MODY_116.3_C2RX-1-10.ReCal.sort.bam",
"MODY_116.4.ReCal.sort.bam",
"MODY_116.4_asdfsadf_1212_asfsdf",
"MODY_116.5.ReCal_asdfsadf_1212_asfsdf", # split by second "_", leaving ".ReCal"
"MODY")
gsubfn("([^_]+_[^_]+)(.).*", f, mystring, backref=2)
# [1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
# [5] "MODY_116.4" "MODY_116.5.ReCal" "MODY"
In the function, f
, x
is the original string, y
and z
are the next matches. So, if z
is not a "_", then it proceeds with the splitting by the alternative string.
在函数中,f x是原始字符串,y和z是下一个匹配项。因此,如果z不是一个“_”,那么它就会通过另一个字符串进行分割。
#2
5
With the stringr
package:
stringr包:
str_extract(mystring, '.*?_.*?(?=_)|^.*?_.*(?=\\.ReCal)')
[1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
It also works with more than two delimiters.
它还可以使用两个以上的分隔符。
#3
5
Perl/PCRE has the branch reset feature that lets you reuse a group number when you have capturing groups in different alternatives, and is considered as one capturing group.
Perl/PCRE具有分支重置特性,当您在不同的替代方案中有捕获组时,您可以重用一个组号,并将其视为一个捕获组。
IMO, this feature is elegant when you want to supply different alternatives.
在我看来,当你想提供不同的选择时,这个特性是很优雅的。
x <- c('MODY_60.2.ReCal.sort.bam', 'MODY_116.21_C4U.ReCal.sort.bam',
'MODY_116.3_C2RX-1-10.ReCal.sort.bam', 'MODY_116.4.ReCal.sort.bam',
'MODY_116.4_asdfsadf_1212_asfsdf', 'MODY_116.5.ReCal_asdfsadf_1212_asfsdf', 'MODY')
sub('^(?|([^_]*_[^_]*)_.*|(.*)\\.ReCal.*)$', '\\1', x, perl=T)
# [1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
# [5] "MODY_116.4" "MODY_116.5.ReCal" "MODY"
#4
4
gsub('^(.*\\.\\d+).*','\\1',mystring)
[1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
#5
2
^([^_\\n]*_[^_\\n]*)(?:_.*|\\.ReCal[^_]*)$
You can simply do using gsub
without using any complex regex.Just replace by \\1
.See demo.
您可以简单地使用gsub而不使用任何复杂的regex。只是取代\ \ 1。看到演示。
https://regex101.com/r/wL4aB6/1
https://regex101.com/r/wL4aB6/1
#6
2
A little longer, but needs less regular expression knowledge:
稍微长一点,但需要较少的正则表达式知识:
library(stringr)
indx <- str_locate_all(mystring, "_")
for (i in seq_along(indx)) {
if (nrow(indx[[i]]) == 1) {
mystring[i] <- strsplit(mystring[i], ".ReCal")[[1]][1]
} else {
mystring[i] <- substr(mystring[i], start = 1, stop = indx[[i]][2] - 1)
}
}
#1
11
You can do this using gsubfn
您可以使用gsubfn来实现这一点
library(gsubfn)
f <- function(x,y,z) if (z=="_") y else strsplit(x, ".ReCal", fixed=T)[[1]][[1]]
gsubfn("([^_]+_[^_]+)(.).*", f, mystring, backref=2)
# [1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
This allows for cases when you have more than two "_", and you want to split on the second one, for example,
这允许当你有两个以上的“_”,而你想在第二个“_”上分裂时,例如,
mystring<-c("MODY_60.2.ReCal.sort.bam",
"MODY_116.21_C4U.ReCal.sort.bam",
"MODY_116.3_C2RX-1-10.ReCal.sort.bam",
"MODY_116.4.ReCal.sort.bam",
"MODY_116.4_asdfsadf_1212_asfsdf",
"MODY_116.5.ReCal_asdfsadf_1212_asfsdf", # split by second "_", leaving ".ReCal"
"MODY")
gsubfn("([^_]+_[^_]+)(.).*", f, mystring, backref=2)
# [1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
# [5] "MODY_116.4" "MODY_116.5.ReCal" "MODY"
In the function, f
, x
is the original string, y
and z
are the next matches. So, if z
is not a "_", then it proceeds with the splitting by the alternative string.
在函数中,f x是原始字符串,y和z是下一个匹配项。因此,如果z不是一个“_”,那么它就会通过另一个字符串进行分割。
#2
5
With the stringr
package:
stringr包:
str_extract(mystring, '.*?_.*?(?=_)|^.*?_.*(?=\\.ReCal)')
[1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
It also works with more than two delimiters.
它还可以使用两个以上的分隔符。
#3
5
Perl/PCRE has the branch reset feature that lets you reuse a group number when you have capturing groups in different alternatives, and is considered as one capturing group.
Perl/PCRE具有分支重置特性,当您在不同的替代方案中有捕获组时,您可以重用一个组号,并将其视为一个捕获组。
IMO, this feature is elegant when you want to supply different alternatives.
在我看来,当你想提供不同的选择时,这个特性是很优雅的。
x <- c('MODY_60.2.ReCal.sort.bam', 'MODY_116.21_C4U.ReCal.sort.bam',
'MODY_116.3_C2RX-1-10.ReCal.sort.bam', 'MODY_116.4.ReCal.sort.bam',
'MODY_116.4_asdfsadf_1212_asfsdf', 'MODY_116.5.ReCal_asdfsadf_1212_asfsdf', 'MODY')
sub('^(?|([^_]*_[^_]*)_.*|(.*)\\.ReCal.*)$', '\\1', x, perl=T)
# [1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
# [5] "MODY_116.4" "MODY_116.5.ReCal" "MODY"
#4
4
gsub('^(.*\\.\\d+).*','\\1',mystring)
[1] "MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
#5
2
^([^_\\n]*_[^_\\n]*)(?:_.*|\\.ReCal[^_]*)$
You can simply do using gsub
without using any complex regex.Just replace by \\1
.See demo.
您可以简单地使用gsub而不使用任何复杂的regex。只是取代\ \ 1。看到演示。
https://regex101.com/r/wL4aB6/1
https://regex101.com/r/wL4aB6/1
#6
2
A little longer, but needs less regular expression knowledge:
稍微长一点,但需要较少的正则表达式知识:
library(stringr)
indx <- str_locate_all(mystring, "_")
for (i in seq_along(indx)) {
if (nrow(indx[[i]]) == 1) {
mystring[i] <- strsplit(mystring[i], ".ReCal")[[1]][1]
} else {
mystring[i] <- substr(mystring[i], start = 1, stop = indx[[i]][2] - 1)
}
}