当子字符串在R中多次出现时,在子字符串之间移除字符串部分。

时间:2021-11-27 20:05:04

In a string

在一个字符串

string="aaaaaaaaaSTARTbbbbbbbbbbSTOPccccccccSTARTddddddddddSTOPeeeeeee"

I would like to remove all parts that occur between START and STOP, yielding

我想把开始和停止之间发生的所有部分都去掉

"aaaaaaaaacccccccceeeeeee"

if I try with gsub("START(.*)STOP","",string) this gives me "aaaaaaaaaeeeeeee" though.

如果我尝试使用gsub(“START(.*)STOP”、“”、“string”),会得到“aaaaaaaaaaaaaaaaaaaaaeeeee”。

What would be the correct way to do this, allowing for multiple occurrences of START and STOP?

如果允许多次出现启动和停止,正确的方法是什么?

2 个解决方案

#1


3  

Add a ? in there too.

添加一个吗?也在那里。

gsub("START.*?STOP", "", string)
# [1] "aaaaaaaaacccccccceeeeeee"

#2


0  

Not nearly as elegant as Ananda's answer, but there are some other ways using the stringr & plyr packages.

虽然没有Ananda的回答那么优雅,但是使用stringr和plyr包还有其他一些方法。

library(stringr)
library(plyr)

start <- ldply(str_locate_all(string, 'START'))[1, 1]
end <- ldply(str_locate_all(string, 'STOP'))
end <- end[nrow(end), 2]
expression <- str_sub(string, start, end)
str_replace(string, expression, '')

#1


3  

Add a ? in there too.

添加一个吗?也在那里。

gsub("START.*?STOP", "", string)
# [1] "aaaaaaaaacccccccceeeeeee"

#2


0  

Not nearly as elegant as Ananda's answer, but there are some other ways using the stringr & plyr packages.

虽然没有Ananda的回答那么优雅,但是使用stringr和plyr包还有其他一些方法。

library(stringr)
library(plyr)

start <- ldply(str_locate_all(string, 'START'))[1, 1]
end <- ldply(str_locate_all(string, 'STOP'))
end <- end[nrow(end), 2]
expression <- str_sub(string, start, end)
str_replace(string, expression, '')