在字符串的特定位置插入一个字符

时间:2021-08-06 06:30:20

I would like to insert an extra character (or a new string) at a specific location in a string. For example, I want to insert d at the fourth location in abcefg to get abcdefg.

我想在字符串的特定位置插入一个额外的字符(或一个新字符串)。例如,我想在abcefg的第四个位置插入d,以获得abcdefg。

Now I am using:

现在我使用:

old <- "abcefg"
n <- 4
paste(substr(old, 1, n-1), "d", substr(old, n, nchar(old)), sep = "")

I could write a one-line simple function for this task, but I am just curious if there is an existing function for that.

我可以为这个任务编写一个简单的单行函数,但是我很好奇是否存在一个已有的函数。

6 个解决方案

#1


44  

You can do this with regular expressions and gsub.

您可以使用正则表达式和gsub实现这一点。

gsub('^([a-z]{3})([a-z]+)$', '\\1d\\2', old)
# [1] "abcdefg"

If you want to do this dynamically, you can create the expressions using paste:

如果您想动态地执行此操作,可以使用paste创建表达式:

letter <- 'd'
lhs <- paste0('^([a-z]{', n-1, '})([a-z]+)$')
rhs <- paste0('\\1', letter, '\\2')
gsub(lhs, rhs, old)
# [1] "abcdefg"

as per DWin's comment,you may want this to be more general.

正如DWin的评论,您可能希望它更一般化。

gsub('^(.{3})(.*)$', '\\1d\\2', old)

This way any three characters will match rather than only lower case. DWin also suggests using sub instead of gsub. This way you don't have to worry about the ^ as much since sub will only match the first instance. But I like to be explicit in regular expressions and only move to more general ones as I understand them and find a need for more generality.

这样,任何三个字符都将匹配,而不仅仅是小写字符。DWin也建议使用sub代替gsub。这样你不用担心一样^自子只匹配第一个实例。但是我喜欢在正则表达式中明确地表达,并且只在我理解它们的时候移动到更一般的表达式中,并且发现需要更多的一般性。


as Greg Snow noted, you can use another form of regular expression that looks behind matches:

正如Greg Snow指出的,你可以使用另一种形式的正则表达式来查找匹配的后面:

sub( '(?<=.{3})', 'd', old, perl=TRUE )

and could also build my dynamic gsub above using sprintf rather than paste0:

也可以使用sprintf而不是paste0构建我上面的动态gsub:

lhs <- sprintf('^([a-z]{%d})([a-z]+)$', n-1) 

or for his sub regular expression:

或者他的次正则表达式:

lhs <- sprintf('(?<=.{%d})',n-1)

#2


9  

stringi package for the rescue once again! The most simple and elegant solution among presented ones.

特瑞再次为救援包裹!最简单、最优雅的解决方案。

stri_sub function allows you to extract parts of the string and substitute parts of it like this:

stri_sub函数允许您提取字符串的某些部分,并将其替换为:

x <- "abcde"
stri_sub(x, 1, 3) # from first to third character
# [1] "abc"
stri_sub(x, 1, 3) <- 1 # substitute from first to third character
x
# [1] "1de"

But if you do this:

但如果你这样做:

x <- "abcde"
stri_sub(x, 3, 2) # from 3 to 2 so... zero ?
# [1] ""
stri_sub(x, 3, 2) <- 1 # substitute from 3 to 2 ... hmm
x
# [1] "ab1cde"

then no characters are removed but new one are inserted. Isn't that cool? :)

然后不删除字符,而是插入新的字符。这不是很酷吗?:)

#3


8  

@Justin's answer is the way I'd actually approach this because of its flexibility, but this could also be a fun approach.

@Justin的回答是我之所以这么做是因为它的灵活性,但这也是一个有趣的方法。

You can treat the string as "fixed width format" and specify where you want to insert your character:

您可以将字符串视为“固定宽度格式”,并指定您想要插入字符的位置:

paste(read.fwf(textConnection(old), 
               c(4, nchar(old)), as.is = TRUE), 
      collapse = "d")

Particularly nice is the output when using sapply, since you get to see the original string as the "name".

特别好的是使用sapply时的输出,因为您可以看到原来的字符串是“name”。

newold <- c("some", "random", "words", "strung", "together")
sapply(newold, function(x) paste(read.fwf(textConnection(x), 
                                          c(4, nchar(x)), as.is = TRUE), 
                                 collapse = "-WEE-"))
#            some          random           words          strung        together 
#   "some-WEE-NA"   "rand-WEE-om"    "word-WEE-s"   "stru-WEE-ng" "toge-WEE-ther" 

#4


3  

Your original way of doing this (i.e. splitting the string at an index and pasting in the inserted text) could be made into a generic function like so:

这样做的原始方法(即在索引处分割字符串并在插入的文本中粘贴)可以被做成如下所示的通用函数:

split_str_by_index <- function(target, index) {
  index <- sort(index)
  substr(rep(target, length(index) + 1),
         start = c(1, index),
         stop = c(index -1, nchar(target)))
}

#Taken from https://stat.ethz.ch/pipermail/r-help/2006-March/101023.html
interleave <- function(v1,v2)
{
  ord1 <- 2*(1:length(v1))-1
  ord2 <- 2*(1:length(v2))
  c(v1,v2)[order(c(ord1,ord2))]
}

insert_str <- function(target, insert, index) {
  insert <- insert[order(index)]
  index <- sort(index)
  paste(interleave(split_str_by_index(target, index), insert), collapse="")
}

Example usage:

使用示例:

> insert_str("1234567890", c("a", "b", "c"), c(5, 9, 3))
[1] "12c34a5678b90"

This allows you to insert a vector of characters at the locations given by a vector of indexes. The split_str_by_index and interleave functions are also useful on their own.

这允许您在索引向量给定的位置插入字符向量。split_str_by_index和interleave函数本身也很有用。

Edit:

编辑:

I revised the code to allow for indexes in any order. Before, indexes needed to be in ascending order.

我修改了代码以允许以任何顺序进行索引。之前,索引需要按升序排列。

#5


0  

I've made a custom function called substr1 to deal with extracting, replacing and inserting chars in a string. Run these codes at the start of every session. Feel free to try it out and let me know if it needs to be improved.

我创建了一个名为substr1的自定义函数,用于在字符串中提取、替换和插入字符。在每个会话开始时运行这些代码。请尝试一下,如果需要改进,请告诉我。

# extraction
substr1 <- function(x,y) {
  z <- sapply(strsplit(as.character(x),''),function(w) paste(na.omit(w[y]),collapse=''))
  dim(z) <- dim(x)
  return(z) }

# substitution + insertion
`substr1<-` <- function(x,y,value) {
  names(y) <- c(value,rep('',length(y)-length(value)))
  z <- sapply(strsplit(as.character(x),''),function(w) {
    v <- seq(w)
    names(v) <- w
    paste(names(sort(c(y,v[setdiff(v,y)]))),collapse='') })
  dim(z) <- dim(x)
  return(z) }

# demonstration
abc <- 'abc'
substr1(abc,1)
# "a"
substr1(abc,c(1,3))
# "ac"
substr1(abc,-1)
# "bc"
substr1(abc,1) <- 'A'
# "Abc"
substr1(abc,1.5) <- 'A'
# "aAbc"
substr1(abc,c(0.5,2,3)) <- c('A','B')
# "AaB"

#6


0  

It took me some time to understand the regular expression, afterwards I found my way with the numbers I had

我花了一些时间来理解这个正则表达式,然后我找到了我拥有的数字的方法

The end result was

最终的结果是

old <- "89580000"
gsub('^([0-9]{5})([0-9]+)$', '\\1-\\2', old)

#1


44  

You can do this with regular expressions and gsub.

您可以使用正则表达式和gsub实现这一点。

gsub('^([a-z]{3})([a-z]+)$', '\\1d\\2', old)
# [1] "abcdefg"

If you want to do this dynamically, you can create the expressions using paste:

如果您想动态地执行此操作,可以使用paste创建表达式:

letter <- 'd'
lhs <- paste0('^([a-z]{', n-1, '})([a-z]+)$')
rhs <- paste0('\\1', letter, '\\2')
gsub(lhs, rhs, old)
# [1] "abcdefg"

as per DWin's comment,you may want this to be more general.

正如DWin的评论,您可能希望它更一般化。

gsub('^(.{3})(.*)$', '\\1d\\2', old)

This way any three characters will match rather than only lower case. DWin also suggests using sub instead of gsub. This way you don't have to worry about the ^ as much since sub will only match the first instance. But I like to be explicit in regular expressions and only move to more general ones as I understand them and find a need for more generality.

这样,任何三个字符都将匹配,而不仅仅是小写字符。DWin也建议使用sub代替gsub。这样你不用担心一样^自子只匹配第一个实例。但是我喜欢在正则表达式中明确地表达,并且只在我理解它们的时候移动到更一般的表达式中,并且发现需要更多的一般性。


as Greg Snow noted, you can use another form of regular expression that looks behind matches:

正如Greg Snow指出的,你可以使用另一种形式的正则表达式来查找匹配的后面:

sub( '(?<=.{3})', 'd', old, perl=TRUE )

and could also build my dynamic gsub above using sprintf rather than paste0:

也可以使用sprintf而不是paste0构建我上面的动态gsub:

lhs <- sprintf('^([a-z]{%d})([a-z]+)$', n-1) 

or for his sub regular expression:

或者他的次正则表达式:

lhs <- sprintf('(?<=.{%d})',n-1)

#2


9  

stringi package for the rescue once again! The most simple and elegant solution among presented ones.

特瑞再次为救援包裹!最简单、最优雅的解决方案。

stri_sub function allows you to extract parts of the string and substitute parts of it like this:

stri_sub函数允许您提取字符串的某些部分,并将其替换为:

x <- "abcde"
stri_sub(x, 1, 3) # from first to third character
# [1] "abc"
stri_sub(x, 1, 3) <- 1 # substitute from first to third character
x
# [1] "1de"

But if you do this:

但如果你这样做:

x <- "abcde"
stri_sub(x, 3, 2) # from 3 to 2 so... zero ?
# [1] ""
stri_sub(x, 3, 2) <- 1 # substitute from 3 to 2 ... hmm
x
# [1] "ab1cde"

then no characters are removed but new one are inserted. Isn't that cool? :)

然后不删除字符,而是插入新的字符。这不是很酷吗?:)

#3


8  

@Justin's answer is the way I'd actually approach this because of its flexibility, but this could also be a fun approach.

@Justin的回答是我之所以这么做是因为它的灵活性,但这也是一个有趣的方法。

You can treat the string as "fixed width format" and specify where you want to insert your character:

您可以将字符串视为“固定宽度格式”,并指定您想要插入字符的位置:

paste(read.fwf(textConnection(old), 
               c(4, nchar(old)), as.is = TRUE), 
      collapse = "d")

Particularly nice is the output when using sapply, since you get to see the original string as the "name".

特别好的是使用sapply时的输出,因为您可以看到原来的字符串是“name”。

newold <- c("some", "random", "words", "strung", "together")
sapply(newold, function(x) paste(read.fwf(textConnection(x), 
                                          c(4, nchar(x)), as.is = TRUE), 
                                 collapse = "-WEE-"))
#            some          random           words          strung        together 
#   "some-WEE-NA"   "rand-WEE-om"    "word-WEE-s"   "stru-WEE-ng" "toge-WEE-ther" 

#4


3  

Your original way of doing this (i.e. splitting the string at an index and pasting in the inserted text) could be made into a generic function like so:

这样做的原始方法(即在索引处分割字符串并在插入的文本中粘贴)可以被做成如下所示的通用函数:

split_str_by_index <- function(target, index) {
  index <- sort(index)
  substr(rep(target, length(index) + 1),
         start = c(1, index),
         stop = c(index -1, nchar(target)))
}

#Taken from https://stat.ethz.ch/pipermail/r-help/2006-March/101023.html
interleave <- function(v1,v2)
{
  ord1 <- 2*(1:length(v1))-1
  ord2 <- 2*(1:length(v2))
  c(v1,v2)[order(c(ord1,ord2))]
}

insert_str <- function(target, insert, index) {
  insert <- insert[order(index)]
  index <- sort(index)
  paste(interleave(split_str_by_index(target, index), insert), collapse="")
}

Example usage:

使用示例:

> insert_str("1234567890", c("a", "b", "c"), c(5, 9, 3))
[1] "12c34a5678b90"

This allows you to insert a vector of characters at the locations given by a vector of indexes. The split_str_by_index and interleave functions are also useful on their own.

这允许您在索引向量给定的位置插入字符向量。split_str_by_index和interleave函数本身也很有用。

Edit:

编辑:

I revised the code to allow for indexes in any order. Before, indexes needed to be in ascending order.

我修改了代码以允许以任何顺序进行索引。之前,索引需要按升序排列。

#5


0  

I've made a custom function called substr1 to deal with extracting, replacing and inserting chars in a string. Run these codes at the start of every session. Feel free to try it out and let me know if it needs to be improved.

我创建了一个名为substr1的自定义函数,用于在字符串中提取、替换和插入字符。在每个会话开始时运行这些代码。请尝试一下,如果需要改进,请告诉我。

# extraction
substr1 <- function(x,y) {
  z <- sapply(strsplit(as.character(x),''),function(w) paste(na.omit(w[y]),collapse=''))
  dim(z) <- dim(x)
  return(z) }

# substitution + insertion
`substr1<-` <- function(x,y,value) {
  names(y) <- c(value,rep('',length(y)-length(value)))
  z <- sapply(strsplit(as.character(x),''),function(w) {
    v <- seq(w)
    names(v) <- w
    paste(names(sort(c(y,v[setdiff(v,y)]))),collapse='') })
  dim(z) <- dim(x)
  return(z) }

# demonstration
abc <- 'abc'
substr1(abc,1)
# "a"
substr1(abc,c(1,3))
# "ac"
substr1(abc,-1)
# "bc"
substr1(abc,1) <- 'A'
# "Abc"
substr1(abc,1.5) <- 'A'
# "aAbc"
substr1(abc,c(0.5,2,3)) <- c('A','B')
# "AaB"

#6


0  

It took me some time to understand the regular expression, afterwards I found my way with the numbers I had

我花了一些时间来理解这个正则表达式,然后我找到了我拥有的数字的方法

The end result was

最终的结果是

old <- "89580000"
gsub('^([0-9]{5})([0-9]+)$', '\\1-\\2', old)