在字符串替换中避免循环?

时间:2021-01-05 22:33:05

I've got data, a character vector (eventually I'll collapse it, so I don't care if it stays a vector or if it's treated as a single string), a vector of patterns, and a vector of replacements. I want each pattern in the data to be replaced by its respective replacement. I got it done with a stringr and a for loop, but is there a more R-like way to do it?

我有数据,一个字符向量(最终我会把它折叠起来,所以我不关心它是一个向量还是一个字符串),一个模式向量,一个替换向量。我希望数据中的每个模式都被各自的替换。我用stringr和for循环完成了它,但是有更像rr的方法吗?

require(stringr)
start_string <- sample(letters[1:10], 10)
my_pattern <- c("a", "b", "c", "z")
my_replacement <- c("[this was an a]", "[this was a b]", "[this was a c]", "[no z!]")
str_replace(start_string, pattern = my_pattern, replacement = my_replacement)
# bad lengths, doesn't work

str_replace(paste0(start_string, collapse = ""),
    pattern = my_pattern, replacement = my_replacement)
# vector output, not what I want in this case

my_result <- start_string
for (i in 1:length(my_pattern)) {
    my_result <- str_replace(my_result,
        pattern = my_pattern[i], replacement = my_replacement[i])
}
> my_result
 [1] "[this was a c]"  "[this was an a]" "e"               "g"               "h"               "[this was a b]" 
 [7] "d"               "j"               "f"               "i"   

# This is what I want, but is there a better way?

In my case, I know each pattern will occur at most once, but not every pattern will occur. I know I could use str_replace_all if patterns might occur more than once; I hope a solution would also provide that option. I'd also like a solution that uses my_pattern and my_replacement so that it could be part of a function with those vectors as arguments.

在我的例子中,我知道每个模式最多出现一次,但不是每个模式都会出现。我知道如果模式可能出现不止一次,我可以使用str_replace_all;我希望解决方案也能提供这种选择。我还想要一个使用my_pattern和my_replace的解决方案,这样它就可以成为函数的一部分,这些向量作为参数。

2 个解决方案

#1


3  

I'll bet there's another way to do this, but my first thought was gsubfn:

我敢打赌还有别的办法,但我的第一个想法是gsubfn:

my_repl <- function(x){
    switch(x,a = "[this was an a]",
             b = "[this was a b]",
             c = "[this was a c]",
             z = "[this was a z]")
}

library(gsubfn)    
start_string <- sample(letters[1:10], 10)
gsubfn("a|b|c|z",my_repl,x = start_string)

If the patterns you are search for a acceptably valid names for list elements, this will also work:

如果您正在为列表元素搜索一个可接受的有效名称的模式,这也将工作:

names(my_replacement) <- my_pattern
gsubfn("a|b|c|z",as.list(my_replacement),start_string)

Edit

编辑

But frankly, if I really had to do this a lot in my own code, I would probably just do the for loop thing, wrapped in a function. Here's a simple version using sub and gsub rather than the functions from stringr:

但是坦白地说,如果我必须在我自己的代码中做很多这样的事情,我可能只会做for循环,包装在一个函数中。下面是一个使用sub和gsub的简单版本,而不是stringr的函数:

vsub <- function(pattern,replacement,x,all = TRUE,...){
  FUN <- if (all) gsub else sub
  for (i in seq_len(min(length(pattern),length(replacement)))){
    x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...)
  }
  x
}

vsub(my_pattern,my_replacement,start_string)

But of course, one of the reasons that there isn't a built-in function for this that's well known is probably that sequential replacements like this can't be pretty fragile, because they are so order dependent:

但是,当然,没有一个众所周知的内置函数的原因之一可能是像这样的顺序替换不会很脆弱,因为它们是如此依赖于顺序:

vsub(rev(my_pattern),rev(my_replacement),start_string)
 [1] "i"                                          "[this w[this was an a]s [this was an a] c]"
 [3] "[this was an a]"                            "g"                                         
 [5] "j"                                          "d"                                         
 [7] "f"                                          "[this w[this was an a]s [this was an a] b]"
 [9] "h"                                          "e"      

#2


1  

Here's an option based on gregrexpr, regmatches, and regmatches<-. Do be aware that there are limits to the length of regular expressions that can be matched, so this won't work if you try to match too many long patterns with it.

这里有一个基于gregregregrexpr、regmatches和regmatches<-的选项。请注意,正则表达式的长度是有限制的,因此如果您试图与它匹配太多的长模式,那么这将不起作用。

replaceSubstrings <- function(patterns, replacements, X) {
    pat <- paste(patterns, collapse="|")
    m <- gregexpr(pat, X)
    regmatches(X, m) <- 
        lapply(regmatches(X,m),
               function(XX) replacements[match(XX, patterns)])
    X
}

## Try it out
patterns <- c("cat", "dog")
replacements <- c("tiger", "coyote")
sentences <- c("A cat", "Two dogs", "Raining cats and dogs")
replaceSubstrings(patterns, replacements, sentences)
## [1] "A tiger"                    "Two coyotes"               
## [3] "Raining tigers and coyotes"

#1


3  

I'll bet there's another way to do this, but my first thought was gsubfn:

我敢打赌还有别的办法,但我的第一个想法是gsubfn:

my_repl <- function(x){
    switch(x,a = "[this was an a]",
             b = "[this was a b]",
             c = "[this was a c]",
             z = "[this was a z]")
}

library(gsubfn)    
start_string <- sample(letters[1:10], 10)
gsubfn("a|b|c|z",my_repl,x = start_string)

If the patterns you are search for a acceptably valid names for list elements, this will also work:

如果您正在为列表元素搜索一个可接受的有效名称的模式,这也将工作:

names(my_replacement) <- my_pattern
gsubfn("a|b|c|z",as.list(my_replacement),start_string)

Edit

编辑

But frankly, if I really had to do this a lot in my own code, I would probably just do the for loop thing, wrapped in a function. Here's a simple version using sub and gsub rather than the functions from stringr:

但是坦白地说,如果我必须在我自己的代码中做很多这样的事情,我可能只会做for循环,包装在一个函数中。下面是一个使用sub和gsub的简单版本,而不是stringr的函数:

vsub <- function(pattern,replacement,x,all = TRUE,...){
  FUN <- if (all) gsub else sub
  for (i in seq_len(min(length(pattern),length(replacement)))){
    x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...)
  }
  x
}

vsub(my_pattern,my_replacement,start_string)

But of course, one of the reasons that there isn't a built-in function for this that's well known is probably that sequential replacements like this can't be pretty fragile, because they are so order dependent:

但是,当然,没有一个众所周知的内置函数的原因之一可能是像这样的顺序替换不会很脆弱,因为它们是如此依赖于顺序:

vsub(rev(my_pattern),rev(my_replacement),start_string)
 [1] "i"                                          "[this w[this was an a]s [this was an a] c]"
 [3] "[this was an a]"                            "g"                                         
 [5] "j"                                          "d"                                         
 [7] "f"                                          "[this w[this was an a]s [this was an a] b]"
 [9] "h"                                          "e"      

#2


1  

Here's an option based on gregrexpr, regmatches, and regmatches<-. Do be aware that there are limits to the length of regular expressions that can be matched, so this won't work if you try to match too many long patterns with it.

这里有一个基于gregregregrexpr、regmatches和regmatches<-的选项。请注意,正则表达式的长度是有限制的,因此如果您试图与它匹配太多的长模式,那么这将不起作用。

replaceSubstrings <- function(patterns, replacements, X) {
    pat <- paste(patterns, collapse="|")
    m <- gregexpr(pat, X)
    regmatches(X, m) <- 
        lapply(regmatches(X,m),
               function(XX) replacements[match(XX, patterns)])
    X
}

## Try it out
patterns <- c("cat", "dog")
replacements <- c("tiger", "coyote")
sentences <- c("A cat", "Two dogs", "Raining cats and dogs")
replaceSubstrings(patterns, replacements, sentences)
## [1] "A tiger"                    "Two coyotes"               
## [3] "Raining tigers and coyotes"