如何使用gsub()完全替换字符串

时间:2021-01-10 16:50:14

I have a corpus: txt = "a patterned layer within a microelectronic pattern." I would like to replace the term "pattern" exactly by "form", I try to write a code:

我有一个语料库:txt =“微电子模式中的图案化层”。我想用“表单”完全替换术语“模式”,我尝试编写代码:

txt_replaced = gsub("pattern","form",txt)

However, the responsed corpus in txt_replaced is: "a formed layer within a microelectronic form."

然而,txt_replaced中的响应语料库是:“微电子形式内的形成层”。

As you can see, the term "patterned" is wrongly replaced by "formed" because parts of characteristics in "patterned" matched to "pattern".

正如您所看到的,术语“图案化”被“形成”错误地取代,因为“图案化”中的特征部分与“图案”匹配。

I would like to query that if I can replace the string exactly using gsub()? That is, only the term with exactly match should be replaced.

我想查询一下,如果我可以使用gsub()完全替换字符串?也就是说,只应替换完全匹配的术语。

I thirst for a responsed as below: "a patterned layer within a microelectronic form."

我渴望得到如下回应:“微电子形式的图案层。”

Many thanks!

非常感谢!

1 个解决方案

#1


25  

As @koshke noted, a very similar question has been answered before (by me). ...But that was grep and this is gsub, so I'll answer it again:

正如@koshke所指出的那样,一个非常相似的问题在我之前得到了回答。 ...但那是grep,这是gsub,所以我会再次回答:

"\<" is an escape sequence for the beginning of a word, and ">" is the end. In R strings you need to double the backslashes, so:

“\ <”是单词开头的转义序列,“>”是结尾。在R字符串中,您需要加倍反斜杠,因此:

txt <- "a patterned layer within a microelectronic pattern."
txt_replaced <- gsub("\\<pattern\\>","form",txt)
txt_replaced
# [1] "a patterned layer within a microelectronic form."

Or, you could use \b instead of \< and \>. \b matches a word boundary so it can be used at both ends>

或者,您可以使用\ b而不是\ <和\> 。 \ b匹配单词边界,因此可以在两端使用>

txt_replaced <- gsub("\\bpattern\\b","form",txt)

Also note that if you want to replace only ONE occurrence, you should use sub instead of gsub.

另请注意,如果您只想替换一次出现,则应使用sub而不是gsub。

#1


25  

As @koshke noted, a very similar question has been answered before (by me). ...But that was grep and this is gsub, so I'll answer it again:

正如@koshke所指出的那样,一个非常相似的问题在我之前得到了回答。 ...但那是grep,这是gsub,所以我会再次回答:

"\<" is an escape sequence for the beginning of a word, and ">" is the end. In R strings you need to double the backslashes, so:

“\ <”是单词开头的转义序列,“>”是结尾。在R字符串中,您需要加倍反斜杠,因此:

txt <- "a patterned layer within a microelectronic pattern."
txt_replaced <- gsub("\\<pattern\\>","form",txt)
txt_replaced
# [1] "a patterned layer within a microelectronic form."

Or, you could use \b instead of \< and \>. \b matches a word boundary so it can be used at both ends>

或者,您可以使用\ b而不是\ <和\> 。 \ b匹配单词边界,因此可以在两端使用>

txt_replaced <- gsub("\\bpattern\\b","form",txt)

Also note that if you want to replace only ONE occurrence, you should use sub instead of gsub.

另请注意,如果您只想替换一次出现,则应使用sub而不是gsub。