How can I efficiently split the following string on the first comma using base?
如何有效地将下面的字符串在第一个逗号中使用base?
x <- "I want to split here, though I don't want to split elsewhere, even here."
strsplit(x, ???)
Desired outcome (2 strings):
期望结果字符串(2):
[[1]]
[1] "I want to split here" "though I don't want to split elsewhere, even here."
Thank you in advance.
提前谢谢你。
EDIT: Didn't think to mention this. This needs to be able to generalize to a column, vector of strings like this, as in:
编辑:我没想过要提及这个。这需要能够推广到一列,像这样的字符串向量,如:
y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")
The outcome can be two columns or one long vector (that I can take every other element of) or a list of stings with each index ([[n]]) having two strings.
结果可以是两列或一个长向量(我可以取所有其他元素),也可以是一个索引([n])有两个字符串的字符串的字符串列表。
Apologies for the lack of clarity.
对不明确表示歉意。
5 个解决方案
#1
11
Here's what I'd probably do. It may seem hacky, but since sub()
and strsplit()
are both vectorized, it will also work smoothly when handed multiple strings.
我可能会这样做。这可能看起来很陈腐,但是由于sub()和strsplit()都是矢量化的,所以在处理多个字符串时,它也会运行得很顺利。
XX <- "SoMeThInGrIdIcUlOuS"
strsplit(sub(",\\s*", XX, x), XX)
# [[1]]
# [1] "I want to split here"
# [2] "though I don't want to split elsewhere, even here."
#2
8
From the stringr
package:
从stringr包:
str_split_fixed(x, pattern = ', ', n = 2)
# [,1]
# [1,] "I want to split here"
# [,2]
# [1,] "though I don't want to split elsewhere, even here."
(That's a matrix with one row and two columns.)
它是一个有一行两列的矩阵
#3
3
Here is yet another solution, with a regular expression to capture what is before and after the first comma.
这里还有另一个解决方案,使用正则表达式捕获第一个逗号前后的内容。
x <- "I want to split here, though I don't want to split elsewhere, even here."
library(stringr)
str_match(x, "^(.*?),\\s*(.*)")[,-1]
# [1] "I want to split here"
# [2] "though I don't want to split elsewhere, even here."
#4
2
library(stringr)
库(stringr)
str_sub(x,end = min(str_locate(string=x, ',')-1))
str_sub(x,结束= min(str_locate(字符串= x,”、“)1))
This will get the first bit you want. Change the start=
and end=
in str_sub
to get what ever else you want.
这将得到你想要的第一点。在str_sub中更改start=和end=以获得您想要的任何其他内容。
Such as:
如:
str_sub(x,start = min(str_locate(string=x, ',')+1 ))
str_sub(x,开始= min(str_locate(字符串= x,”、“)+ 1))
and wrap in str_trim
to get rid of the leading space:
用str_trim包起来去掉前缘空间:
str_trim(str_sub(x,start = min(str_locate(string=x, ',')+1 )))
str_trim(str_sub(x,开始= min(str_locate(字符串= x,”、“)+ 1)))
#5
2
This works but I like Josh Obrien's better:
这行得通,但我更喜欢乔什·奥布里恩的:
y <- strsplit(x, ",")
sapply(y, function(x) data.frame(x= x[1],
z=paste(x[-1], collapse=",")), simplify=F))
Inspired by chase's response.
灵感来自追逐的反应。
A number of people gave non base approaches so I figure I'd add the one I usually use (though in this case I needed a base response):
一些人给出了非基方法,所以我想我应该加上我通常使用的一个(虽然在这种情况下我需要一个基响应):
y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")
library(reshape2)
colsplit(y, ",", c("x","z"))
#1
11
Here's what I'd probably do. It may seem hacky, but since sub()
and strsplit()
are both vectorized, it will also work smoothly when handed multiple strings.
我可能会这样做。这可能看起来很陈腐,但是由于sub()和strsplit()都是矢量化的,所以在处理多个字符串时,它也会运行得很顺利。
XX <- "SoMeThInGrIdIcUlOuS"
strsplit(sub(",\\s*", XX, x), XX)
# [[1]]
# [1] "I want to split here"
# [2] "though I don't want to split elsewhere, even here."
#2
8
From the stringr
package:
从stringr包:
str_split_fixed(x, pattern = ', ', n = 2)
# [,1]
# [1,] "I want to split here"
# [,2]
# [1,] "though I don't want to split elsewhere, even here."
(That's a matrix with one row and two columns.)
它是一个有一行两列的矩阵
#3
3
Here is yet another solution, with a regular expression to capture what is before and after the first comma.
这里还有另一个解决方案,使用正则表达式捕获第一个逗号前后的内容。
x <- "I want to split here, though I don't want to split elsewhere, even here."
library(stringr)
str_match(x, "^(.*?),\\s*(.*)")[,-1]
# [1] "I want to split here"
# [2] "though I don't want to split elsewhere, even here."
#4
2
library(stringr)
库(stringr)
str_sub(x,end = min(str_locate(string=x, ',')-1))
str_sub(x,结束= min(str_locate(字符串= x,”、“)1))
This will get the first bit you want. Change the start=
and end=
in str_sub
to get what ever else you want.
这将得到你想要的第一点。在str_sub中更改start=和end=以获得您想要的任何其他内容。
Such as:
如:
str_sub(x,start = min(str_locate(string=x, ',')+1 ))
str_sub(x,开始= min(str_locate(字符串= x,”、“)+ 1))
and wrap in str_trim
to get rid of the leading space:
用str_trim包起来去掉前缘空间:
str_trim(str_sub(x,start = min(str_locate(string=x, ',')+1 )))
str_trim(str_sub(x,开始= min(str_locate(字符串= x,”、“)+ 1)))
#5
2
This works but I like Josh Obrien's better:
这行得通,但我更喜欢乔什·奥布里恩的:
y <- strsplit(x, ",")
sapply(y, function(x) data.frame(x= x[1],
z=paste(x[-1], collapse=",")), simplify=F))
Inspired by chase's response.
灵感来自追逐的反应。
A number of people gave non base approaches so I figure I'd add the one I usually use (though in this case I needed a base response):
一些人给出了非基方法,所以我想我应该加上我通常使用的一个(虽然在这种情况下我需要一个基响应):
y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")
library(reshape2)
colsplit(y, ",", c("x","z"))