Suppose that I have the following sentence:
假设我有以下句子:
bean likes to sell his beans
and I want to replace all occurrences of specific words with other words. For example, bean
to robert
and beans
to cars
.
我想用其他词替换所有出现的特定词。例如,豆子对罗伯特,豆子对汽车。
I can't just use str.replace
because in this case it'll change the beans
to roberts
.
我不能只使用str.replace,因为在这种情况下,它将把bean改为roberts。
>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'
I need to change the whole words only, not the occurrences of the word in the other word. I think that I can achieve this by using regular expressions but don't know how to do it right.
我只需要更改整个单词,而不是在另一个单词中出现的单词。我认为我可以通过使用正则表达式来实现这一点,但是我不知道如何正确地实现它。
4 个解决方案
#1
15
If you use regex, you can specify word boundaries with \b
:
如果使用regex,可以使用\b指定单词边界:
import re
sentence = 'bean likes to sell his beans'
sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'
Here 'beans' is not changed (to 'roberts') because the 's' on the end is not a boundary between words: \b
matches the empty string, but only at the beginning or end of a word.
这里的“beans”没有被更改(改为“roberts”),因为结尾的“s”不是单词之间的边界:\b匹配空字符串,但只在单词的开头或结尾。
The second replacement for completeness:
第二次完整性替换:
sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'
#2
3
If you replace each word one at a time, you might replace words several times (and not get what you want). To avoid this, you can use a function or lambda:
如果你一次替换一个单词,你可能会替换几个单词(而不是得到你想要的)。为了避免这种情况,您可以使用一个函数或lambda:
d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)
That way, once bean
is replaced by robert
, it won't be modified again (even if robert
is also in your input list of words).
这样,一旦bean被robert替换,它就不会再被修改(即使robert也在您的输入列表中)。
As suggested by georg, I edited this answer with dict.get(key, default_value)
. Alternative solution (also suggested by georg):
根据georg的建议,我使用dict.get(key, default_value)编辑了这个答案。备选解决方案(georg也建议):
str_out = re.sub(r'\b(%s)\b' % '|'.join(d.keys()), lambda m:d.get(m.group(1), m.group(1)), str_in)
#3
-1
"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")
Will replace all instances of "beans" with "cars" and "bean" with "robert". This works because .replace()
returns a modified instance of original string. As such, you can think of it in stages. It essentially works this way:
将“bean”的所有实例替换为“cars”,“bean”替换为“robert”。这样做是因为.replace()返回修改后的原始字符串实例。因此,你可以把它分阶段考虑。本质上是这样的:
>>> first_string = "bean likes to sell his beans"
>>> second_string = first_string.replace("beans", "cars")
>>> third_string = second_string.replace("bean", "robert")
>>> print(first_string, second_string, third_string)
('bean likes to sell his beans', 'bean likes to sell his cars',
'robert likes to sell his cars')
#4
-1
I know its been a long time but Does this look much more elegant? :
我知道这是很久以前的事了,但这看起来更优雅吗?:
reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")
#1
15
If you use regex, you can specify word boundaries with \b
:
如果使用regex,可以使用\b指定单词边界:
import re
sentence = 'bean likes to sell his beans'
sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'
Here 'beans' is not changed (to 'roberts') because the 's' on the end is not a boundary between words: \b
matches the empty string, but only at the beginning or end of a word.
这里的“beans”没有被更改(改为“roberts”),因为结尾的“s”不是单词之间的边界:\b匹配空字符串,但只在单词的开头或结尾。
The second replacement for completeness:
第二次完整性替换:
sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'
#2
3
If you replace each word one at a time, you might replace words several times (and not get what you want). To avoid this, you can use a function or lambda:
如果你一次替换一个单词,你可能会替换几个单词(而不是得到你想要的)。为了避免这种情况,您可以使用一个函数或lambda:
d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)
That way, once bean
is replaced by robert
, it won't be modified again (even if robert
is also in your input list of words).
这样,一旦bean被robert替换,它就不会再被修改(即使robert也在您的输入列表中)。
As suggested by georg, I edited this answer with dict.get(key, default_value)
. Alternative solution (also suggested by georg):
根据georg的建议,我使用dict.get(key, default_value)编辑了这个答案。备选解决方案(georg也建议):
str_out = re.sub(r'\b(%s)\b' % '|'.join(d.keys()), lambda m:d.get(m.group(1), m.group(1)), str_in)
#3
-1
"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")
Will replace all instances of "beans" with "cars" and "bean" with "robert". This works because .replace()
returns a modified instance of original string. As such, you can think of it in stages. It essentially works this way:
将“bean”的所有实例替换为“cars”,“bean”替换为“robert”。这样做是因为.replace()返回修改后的原始字符串实例。因此,你可以把它分阶段考虑。本质上是这样的:
>>> first_string = "bean likes to sell his beans"
>>> second_string = first_string.replace("beans", "cars")
>>> third_string = second_string.replace("bean", "robert")
>>> print(first_string, second_string, third_string)
('bean likes to sell his beans', 'bean likes to sell his cars',
'robert likes to sell his cars')
#4
-1
I know its been a long time but Does this look much more elegant? :
我知道这是很久以前的事了,但这看起来更优雅吗?:
reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")