替换出现的所有特定单词

时间:2022-09-13 09:53:15

Suppose that I have the following sentence:

假设我有以下句子:

bean likes to sell his beans

and I want to replace all occurrences of specific words with other words. For example, bean to robert and beans to cars.

我想用其他词替换所有出现的特定词。例如,豆子对罗伯特,豆子对汽车。

I can't just use str.replace because in this case it'll change the beans to roberts.

我不能只使用str.replace,因为在这种情况下,它将把bean改为roberts。

>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'

I need to change the whole words only, not the occurrences of the word in the other word. I think that I can achieve this by using regular expressions but don't know how to do it right.

我只需要更改整个单词,而不是在另一个单词中出现的单词。我认为我可以通过使用正则表达式来实现这一点,但是我不知道如何正确地实现它。

4 个解决方案

#1


15  

If you use regex, you can specify word boundaries with \b:

如果使用regex,可以使用\b指定单词边界:

import re

sentence = 'bean likes to sell his beans'

sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'

Here 'beans' is not changed (to 'roberts') because the 's' on the end is not a boundary between words: \b matches the empty string, but only at the beginning or end of a word.

这里的“beans”没有被更改(改为“roberts”),因为结尾的“s”不是单词之间的边界:\b匹配空字符串,但只在单词的开头或结尾。

The second replacement for completeness:

第二次完整性替换:

sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'

#2


3  

If you replace each word one at a time, you might replace words several times (and not get what you want). To avoid this, you can use a function or lambda:

如果你一次替换一个单词,你可能会替换几个单词(而不是得到你想要的)。为了避免这种情况,您可以使用一个函数或lambda:

d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)

That way, once bean is replaced by robert, it won't be modified again (even if robert is also in your input list of words).

这样,一旦bean被robert替换,它就不会再被修改(即使robert也在您的输入列表中)。

As suggested by georg, I edited this answer with dict.get(key, default_value). Alternative solution (also suggested by georg):

根据georg的建议,我使用dict.get(key, default_value)编辑了这个答案。备选解决方案(georg也建议):

str_out = re.sub(r'\b(%s)\b' % '|'.join(d.keys()), lambda m:d.get(m.group(1), m.group(1)), str_in)

#3


-1  

"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")

Will replace all instances of "beans" with "cars" and "bean" with "robert". This works because .replace() returns a modified instance of original string. As such, you can think of it in stages. It essentially works this way:

将“bean”的所有实例替换为“cars”,“bean”替换为“robert”。这样做是因为.replace()返回修改后的原始字符串实例。因此,你可以把它分阶段考虑。本质上是这样的:

 >>> first_string = "bean likes to sell his beans"
 >>> second_string = first_string.replace("beans", "cars")
 >>> third_string = second_string.replace("bean", "robert")
 >>> print(first_string, second_string, third_string)

 ('bean likes to sell his beans', 'bean likes to sell his cars', 
  'robert likes to sell his cars')

#4


-1  

I know its been a long time but Does this look much more elegant? :

我知道这是很久以前的事了,但这看起来更优雅吗?:

reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")

#1


15  

If you use regex, you can specify word boundaries with \b:

如果使用regex,可以使用\b指定单词边界:

import re

sentence = 'bean likes to sell his beans'

sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'

Here 'beans' is not changed (to 'roberts') because the 's' on the end is not a boundary between words: \b matches the empty string, but only at the beginning or end of a word.

这里的“beans”没有被更改(改为“roberts”),因为结尾的“s”不是单词之间的边界:\b匹配空字符串,但只在单词的开头或结尾。

The second replacement for completeness:

第二次完整性替换:

sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'

#2


3  

If you replace each word one at a time, you might replace words several times (and not get what you want). To avoid this, you can use a function or lambda:

如果你一次替换一个单词,你可能会替换几个单词(而不是得到你想要的)。为了避免这种情况,您可以使用一个函数或lambda:

d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)

That way, once bean is replaced by robert, it won't be modified again (even if robert is also in your input list of words).

这样,一旦bean被robert替换,它就不会再被修改(即使robert也在您的输入列表中)。

As suggested by georg, I edited this answer with dict.get(key, default_value). Alternative solution (also suggested by georg):

根据georg的建议,我使用dict.get(key, default_value)编辑了这个答案。备选解决方案(georg也建议):

str_out = re.sub(r'\b(%s)\b' % '|'.join(d.keys()), lambda m:d.get(m.group(1), m.group(1)), str_in)

#3


-1  

"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")

Will replace all instances of "beans" with "cars" and "bean" with "robert". This works because .replace() returns a modified instance of original string. As such, you can think of it in stages. It essentially works this way:

将“bean”的所有实例替换为“cars”,“bean”替换为“robert”。这样做是因为.replace()返回修改后的原始字符串实例。因此,你可以把它分阶段考虑。本质上是这样的:

 >>> first_string = "bean likes to sell his beans"
 >>> second_string = first_string.replace("beans", "cars")
 >>> third_string = second_string.replace("bean", "robert")
 >>> print(first_string, second_string, third_string)

 ('bean likes to sell his beans', 'bean likes to sell his cars', 
  'robert likes to sell his cars')

#4


-1  

I know its been a long time but Does this look much more elegant? :

我知道这是很久以前的事了,但这看起来更优雅吗?:

reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")