I wish to split strings at a certain character while retaining that character in the second resulting string. I can achieve almost all of the desired operation, except that I lose the characters I specify in strsplit
, which I guess is called the delimiter.
我希望将字符串拆分为某个字符,同时将该字符保留在第二个结果字符串中。我可以实现几乎所有所需的操作,除了我丢失了我在strsplit中指定的字符,我猜这个字符称为分隔符。
Is there a way to request that strsplit
retain the delimiter? Or must I use a regular expression of some kind? Thank you for any advice. This seems like a very basic question. Sorry if it is a duplicate. I prefer to use base R.
有没有办法要求strsplit保留分隔符?或者我必须使用某种正则表达式吗?谢谢你的任何建议。这似乎是一个非常基本的问题。对不起,如果它是重复的。我更喜欢使用底座R.
Here is an example showing what I have so far:
这是一个显示我到目前为止的例子:
my.table <- read.table(text = '
model npar AICc
AA(~region+state+county+city)BB(~region+state+county+city)CC(~1) 17 11111.11
AA(~region+state+county)BB(~region+state+county)CC(~123) 14 22222.22
AA(~region+state)BB(~region+state)CC(~33) 13 33333.33
AA(~region)BB(~region)CC(~4321) 6 44444.44
', header = TRUE, stringsAsFactors = FALSE)
desired.result <- read.table(text = '
model CC npar AICc
AA(~region+state+county+city)BB(~region+state+county+city) CC(~1) 17 11111.11
AA(~region+state+county)BB(~region+state+county) CC(~123) 14 22222.22
AA(~region+state)BB(~region+state) CC(~33) 13 33333.33
AA(~region)BB(~region) CC(~4321) 6 44444.44
', header = TRUE, stringsAsFactors = FALSE)
split.model <- strsplit(my.table$model, 'CC\\(')
split.models <- matrix(unlist(split.model), ncol=2, byrow=TRUE, dimnames = list(NULL, c("model", "CC")))
desires.result2 <- data.frame(split.models, my.table[,2:ncol(my.table)])
desires.result2
# model CC npar AICc
# 1 AA(~region+state+county+city)BB(~region+state+county+city) ~1) 17 11111.11
# 2 AA(~region+state+county)BB(~region+state+county) ~123) 14 22222.22
# 3 AA(~region+state)BB(~region+state) ~33) 13 33333.33
# 4 AA(~region)BB(~region) ~4321) 6 44444.44
3 个解决方案
#1
9
The basic idea is to use look-around
operations from regular expressions to strsplit
to get your desired result. However, it's a bit trickier than that with strsplit
and positive lookahead. Read this excellent post from @JoshO'Brien for explanation.
基本思想是使用从正则表达式到strsplit的环视操作来获得所需的结果。然而,它比strsplit和积极前瞻更有点棘手。阅读@ JoshO'Brien的这篇优秀文章,以获得解释。
pattern <- "(?<=\\))(?=CC)"
strsplit(my.table$model, pattern, perl=TRUE)
# [[1]]
# [1] "AA(~region+state+county+city)BB(~region+state+county+city)"
# [2] "CC(~1)"
# [[2]]
# [1] "AA(~region+state+county)BB(~region+state+county)"
# [2] "CC(~123)"
# [[3]]
# [1] "AA(~region+state)BB(~region+state)" "CC(~33)"
# [[4]]
# [1] "AA(~region)BB(~region)" "CC(~4321)"
Of course, I leave the task of do.call(rbind, ...)
and cbind
to get the final desired.output
to you.
当然,我将do.call(rbind,...)和cbind的任务留给你,以获得最终的desired.output。
#2
0
Almost right after I posted I thought of using gsub
to insert a space and then split on the space. Although, I like Arun's answer better.
几乎在我发布后我想到使用gsub插入一个空格然后拆分空间。虽然,我更喜欢阿伦的答案。
my.table <- read.table(text = '
model npar AICc
AA(~region+state+county+city)BB(~region+state+county+city)CC(~1) 17 11111.11
AA(~region+state+county)BB(~region+state+county)CC(~123) 14 22222.22
AA(~region+state)BB(~region+state)CC(~33) 13 33333.33
AA(~region)BB(~region)CC(~4321) 6 44444.44
', header = TRUE, stringsAsFactors = FALSE)
my.table$model <- gsub("CC", " CC", my.table$model)
split.model <- strsplit(my.table$model, ' ')
split.models <- matrix(unlist(split.model), ncol=2, byrow=TRUE, dimnames = list(NULL, c("model", "CC")))
desires.result <- data.frame(split.models, my.table[,2:ncol(my.table)])
desires.result
# model CC npar AICc
# 1 AA(~region+state+county+city)BB(~region+state+county+city) CC(~1) 17 11111.11
# 2 AA(~region+state+county)BB(~region+state+county) CC(~123) 14 22222.22
# 3 AA(~region+state)BB(~region+state) CC(~33) 13 33333.33
# 4 AA(~region)BB(~region) CC(~4321) 6 44444.44
#3
0
... why not just tack the separator back on afterwards? Would seem to save a lot of trouble fiddling with regexes.
...为什么不在之后重新点击分隔符?似乎可以省去摆弄正则表达式的麻烦。
split.model <- lapply(strsplit(my.table$model, 'CC\\('), function(x) {
x[2] <- paste0("CC(", x[2])
x
})
#1
9
The basic idea is to use look-around
operations from regular expressions to strsplit
to get your desired result. However, it's a bit trickier than that with strsplit
and positive lookahead. Read this excellent post from @JoshO'Brien for explanation.
基本思想是使用从正则表达式到strsplit的环视操作来获得所需的结果。然而,它比strsplit和积极前瞻更有点棘手。阅读@ JoshO'Brien的这篇优秀文章,以获得解释。
pattern <- "(?<=\\))(?=CC)"
strsplit(my.table$model, pattern, perl=TRUE)
# [[1]]
# [1] "AA(~region+state+county+city)BB(~region+state+county+city)"
# [2] "CC(~1)"
# [[2]]
# [1] "AA(~region+state+county)BB(~region+state+county)"
# [2] "CC(~123)"
# [[3]]
# [1] "AA(~region+state)BB(~region+state)" "CC(~33)"
# [[4]]
# [1] "AA(~region)BB(~region)" "CC(~4321)"
Of course, I leave the task of do.call(rbind, ...)
and cbind
to get the final desired.output
to you.
当然,我将do.call(rbind,...)和cbind的任务留给你,以获得最终的desired.output。
#2
0
Almost right after I posted I thought of using gsub
to insert a space and then split on the space. Although, I like Arun's answer better.
几乎在我发布后我想到使用gsub插入一个空格然后拆分空间。虽然,我更喜欢阿伦的答案。
my.table <- read.table(text = '
model npar AICc
AA(~region+state+county+city)BB(~region+state+county+city)CC(~1) 17 11111.11
AA(~region+state+county)BB(~region+state+county)CC(~123) 14 22222.22
AA(~region+state)BB(~region+state)CC(~33) 13 33333.33
AA(~region)BB(~region)CC(~4321) 6 44444.44
', header = TRUE, stringsAsFactors = FALSE)
my.table$model <- gsub("CC", " CC", my.table$model)
split.model <- strsplit(my.table$model, ' ')
split.models <- matrix(unlist(split.model), ncol=2, byrow=TRUE, dimnames = list(NULL, c("model", "CC")))
desires.result <- data.frame(split.models, my.table[,2:ncol(my.table)])
desires.result
# model CC npar AICc
# 1 AA(~region+state+county+city)BB(~region+state+county+city) CC(~1) 17 11111.11
# 2 AA(~region+state+county)BB(~region+state+county) CC(~123) 14 22222.22
# 3 AA(~region+state)BB(~region+state) CC(~33) 13 33333.33
# 4 AA(~region)BB(~region) CC(~4321) 6 44444.44
#3
0
... why not just tack the separator back on afterwards? Would seem to save a lot of trouble fiddling with regexes.
...为什么不在之后重新点击分隔符?似乎可以省去摆弄正则表达式的麻烦。
split.model <- lapply(strsplit(my.table$model, 'CC\\('), function(x) {
x[2] <- paste0("CC(", x[2])
x
})