Given a string such as:
给出如下字符串:
text <- "abcdefghijklmnopqrstuvwxyz"
I would like to chop the string into substrings, for example length 10, and keep the remainder:
我想将字符串切成子串,例如长度为10,并保留余数:
"abcdefghij"
"klmnopqrst"
"uvwxyz"
All the methods I know for creating substrings will not give me the remainder substring with 6 characters. I have tried answers from previous similar questions such as:
我知道用于创建子字符串的所有方法都不会给我带有6个字符的余数子字符串。我尝试过以前类似问题的答案,例如:
> substring(text, seq(1, nchar(text), 10), seq(10, nchar(text), 10))
[1] "abcdefghij" "klmnopqrst" ""
Any advice as to how to obtain all the substrings of the desired length and any remainder strings would be much appreciated.
关于如何获得所需长度和任何其余字符串的所有子串的任何建议都将非常感激。
3 个解决方案
#1
8
The vectors that you use for the first
and last
arguments in substring
can exceed the number of characters in the string without error/warning/problems. So you can do
用于substring中第一个和最后一个参数的向量可以超出字符串中的字符数,而不会出现错误/警告/问题。所以你可以做到
text <- "abcdefghijklmnopqrstuvwxyz"
sq <- seq.int(to = nchar(text), by = 10)
substring(text, sq, sq + 9)
# [1] "abcdefghij" "klmnopqrst" "uvwxyz"
#2
10
Try
strsplit(text, '(?<=.{10})', perl=TRUE)[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz"
Or you can use the library(stringi)
for faster approach
或者您可以使用库(stringi)来获得更快的方法
library(stringi)
stri_extract_all_regex(text, '.{1,10}')[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz"
#3
3
Here is a way using strapplyc
involving a fairly simple regular expression. It works because .{1,10}
always matches the longest string that is no longer than 10 characters:
这是一种使用strapplyc的方法,涉及一个相当简单的正则表达式。它的工作原理是因为。{1,10}总是匹配不超过10个字符的最长字符串:
library(gsubfn)
strapplyc(text, ".{1,10}", simplify = c)
giving:
[1] "abcdefghij" "klmnopqrst" "uvwxyz"
Visualization This regular expression is simple enough that it does not really require a visualization but here is one anyways:
可视化这个正则表达式非常简单,它实际上并不需要可视化,但无论如何这里是一个:
.{1,10}
#1
8
The vectors that you use for the first
and last
arguments in substring
can exceed the number of characters in the string without error/warning/problems. So you can do
用于substring中第一个和最后一个参数的向量可以超出字符串中的字符数,而不会出现错误/警告/问题。所以你可以做到
text <- "abcdefghijklmnopqrstuvwxyz"
sq <- seq.int(to = nchar(text), by = 10)
substring(text, sq, sq + 9)
# [1] "abcdefghij" "klmnopqrst" "uvwxyz"
#2
10
Try
strsplit(text, '(?<=.{10})', perl=TRUE)[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz"
Or you can use the library(stringi)
for faster approach
或者您可以使用库(stringi)来获得更快的方法
library(stringi)
stri_extract_all_regex(text, '.{1,10}')[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz"
#3
3
Here is a way using strapplyc
involving a fairly simple regular expression. It works because .{1,10}
always matches the longest string that is no longer than 10 characters:
这是一种使用strapplyc的方法,涉及一个相当简单的正则表达式。它的工作原理是因为。{1,10}总是匹配不超过10个字符的最长字符串:
library(gsubfn)
strapplyc(text, ".{1,10}", simplify = c)
giving:
[1] "abcdefghij" "klmnopqrst" "uvwxyz"
Visualization This regular expression is simple enough that it does not really require a visualization but here is one anyways:
可视化这个正则表达式非常简单,它实际上并不需要可视化,但无论如何这里是一个:
.{1,10}