I have a list of tuples, each containing a find/replace value that I would like to apply to a string. What would be the most efficient way to do so? I will be applying this iteratively, so performance is my biggest concern.
我有一个元组列表,每个元组包含一个我想要应用于字符串的查找/替换值。最有效的方法是什么?我将迭代地应用它,因此性能是我最关心的问题。
More concretely, what would the innards of processThis() look like?
更具体地说,processThis()的内部会是什么样子?
x = 'find1, find2, find3'
y = [('find1', 'replace1'), ('find2', 'replace2'), ('find3', 'replace3')]
def processThis(str,lst):
# Do something here
return something
>>> processThis(x,y)
'replace1, replace2, replace3'
Thanks, all!
5 个解决方案
#1
You could consider using re.sub
:
您可以考虑使用re.sub:
import re
REPLACEMENTS = dict([('find1', 'replace1'),
('find2', 'replace2'),
('find3', 'replace3')])
def replacer(m):
return REPLACEMENTS[m.group(0)]
x = 'find1, find2, find3'
r = re.compile('|'.join(REPLACEMENTS.keys()))
print r.sub(replacer, x)
#2
A couple notes:
几个笔记:
- The boilerplate argument about premature optimization, benchmarking, bottlenecks, 100 is small, etc.
- There are cases where the different solutions will return different results. if
y = [('one', 'two'), ('two', 'three')]
andx = 'one'
then mhawke's solution gives you'two'
and Unknown's gives'three'
. - Testing this out in a silly contrived example mhawke's solution was a tiny bit faster. It should be easy to try it with your data though.
关于过早优化,基准测试,瓶颈,100的样板论证很小,等等。
在某些情况下,不同的解决方案会返回不同的结果。如果y = [('one','two'),('two','three')]和x ='one',那么mhawke的解决方案会给你'two'而Unknown'会给'3'。
在一个愚蠢的人为例子中测试这一点,mhawke的解决方案要快一点。尽管如此,应该很容易尝试使用您的数据。
#3
x = 'find1, find2, find3'
y = [('find1', 'replace1'), ('find2', 'replace2'), ('find3', 'replace3')]
def processThis(str,lst):
for find, replace in lst:
str = str.replace(find, replace)
return str
>>> processThis(x,y)
'replace1, replace2, replace3'
#4
s = reduce(lambda x, repl: str.replace(x, *repl), lst, s)
#5
Same answer as mhawke, enclosed with method str_replace
与mhawke相同的答案,用方法str_replace括起来
def str_replace(data, search_n_replace_dict):
import re
REPLACEMENTS = search_n_replace_dict
def replacer(m):
return REPLACEMENTS[m.group(0)]
r = re.compile('|'.join(REPLACEMENTS.keys()))
return r.sub(replacer, data)
Then we can call this method with example as below
然后我们可以用下面的例子调用这个方法
s = "abcd abcd efgh efgh;;;;;; lkmnkd kkkkk"
d = dict({ 'abcd' : 'aaaa', 'efgh' : 'eeee', 'mnkd' : 'mmmm' })
print (s)
print ("\n")
print(str_replace(s, d))
output :
abcd abcd efgh efgh;;;;;; lkmnkd kkkkk
aaaa aaaa eeee eeee;;;;;; lkmmmm kkkkk
#1
You could consider using re.sub
:
您可以考虑使用re.sub:
import re
REPLACEMENTS = dict([('find1', 'replace1'),
('find2', 'replace2'),
('find3', 'replace3')])
def replacer(m):
return REPLACEMENTS[m.group(0)]
x = 'find1, find2, find3'
r = re.compile('|'.join(REPLACEMENTS.keys()))
print r.sub(replacer, x)
#2
A couple notes:
几个笔记:
- The boilerplate argument about premature optimization, benchmarking, bottlenecks, 100 is small, etc.
- There are cases where the different solutions will return different results. if
y = [('one', 'two'), ('two', 'three')]
andx = 'one'
then mhawke's solution gives you'two'
and Unknown's gives'three'
. - Testing this out in a silly contrived example mhawke's solution was a tiny bit faster. It should be easy to try it with your data though.
关于过早优化,基准测试,瓶颈,100的样板论证很小,等等。
在某些情况下,不同的解决方案会返回不同的结果。如果y = [('one','two'),('two','three')]和x ='one',那么mhawke的解决方案会给你'two'而Unknown'会给'3'。
在一个愚蠢的人为例子中测试这一点,mhawke的解决方案要快一点。尽管如此,应该很容易尝试使用您的数据。
#3
x = 'find1, find2, find3'
y = [('find1', 'replace1'), ('find2', 'replace2'), ('find3', 'replace3')]
def processThis(str,lst):
for find, replace in lst:
str = str.replace(find, replace)
return str
>>> processThis(x,y)
'replace1, replace2, replace3'
#4
s = reduce(lambda x, repl: str.replace(x, *repl), lst, s)
#5
Same answer as mhawke, enclosed with method str_replace
与mhawke相同的答案,用方法str_replace括起来
def str_replace(data, search_n_replace_dict):
import re
REPLACEMENTS = search_n_replace_dict
def replacer(m):
return REPLACEMENTS[m.group(0)]
r = re.compile('|'.join(REPLACEMENTS.keys()))
return r.sub(replacer, data)
Then we can call this method with example as below
然后我们可以用下面的例子调用这个方法
s = "abcd abcd efgh efgh;;;;;; lkmnkd kkkkk"
d = dict({ 'abcd' : 'aaaa', 'efgh' : 'eeee', 'mnkd' : 'mmmm' })
print (s)
print ("\n")
print(str_replace(s, d))
output :
abcd abcd efgh efgh;;;;;; lkmnkd kkkkk
aaaa aaaa eeee eeee;;;;;; lkmmmm kkkkk