Is there any efficient mass string concatenation method in Python (like StringBuilder in C# or StringBuffer in Java)? I found following methods here:
Python中是否有有效的质量字符串连接方法(如c#中的StringBuilder或Java中的StringBuffer)?我在这里找到了以下方法:
- Simple concatenation using
+
- 简单的连接使用+
- Using string list and
join
method - 使用字符串列表和连接方法
- Using
UserString
fromMutableString
module - 使用来自MutableString模块的UserString。
- Using character array and the
array
module - 使用字符数组和数组模块。
- Using
cStringIO
fromStringIO
module - 使用来自StringIO模块的cStringIO
But what do you experts use or suggest, and why?
但是你的专家使用或建议什么,为什么?
(一个相关的问题)
12 个解决方案
#1
100
You may be interested in this: An optimization anecdote by Guido. Although it is worth remembering also that this is an old article and it predates the existence of things like ''.join
(although I guess string.joinfields
is more-or-less the same)
你可能会对圭多的优化轶事感兴趣。虽然这也是值得记住的,这是一篇古老的文章,它先于诸如“”之类的东西的存在。加入(虽然我猜是字符串。joinfield差不多是一样的)
On the strength of that, the array
module may be fastest if you can shoehorn your problem into it. But ''.join
is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand.
基于这一点,如果您能够将问题塞进数组模块,那么数组模块可能是最快的。但“。join可能足够快,并且具有惯用性的优点,因此对于其他python程序员来说更容易理解。
Finally, the golden rule of optimization: don't optimize unless you know you need to, and measure rather than guessing.
最后,优化的金科玉律:不要优化,除非你知道你需要,并测量而不是猜测。
You can measure different methods using the timeit
module. That can tell you which is fastest, instead of random strangers on the internet making guesses.
您可以使用timeit模块来度量不同的方法。它可以告诉你哪个是最快的,而不是在网上随机的陌生人猜测。
#2
53
''.join(sequenceofstrings)
is what usually works best -- simplest and fastest.
.join(sequenceofstring)通常是最有效的——最简单、最快。
#3
32
It depends on what you're doing.
这取决于你在做什么。
After Python 2.5, string concatenation with the + operator is pretty fast. If you're just concatenating a couple of values, using the + operator works best:
在Python 2.5之后,与+运算符的字符串连接非常快。如果你只是连接两个值,使用+运算符效果最好:
>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344
>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984
However, if you're putting together a string in a loop, you're better off using the list joining method:
然而,如果你把一个字符串放在一个循环中,你最好使用列表连接方法:
>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
... joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267
>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
... str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184
...but notice that you have to be putting together a relatively high number of strings before the difference becomes noticeable.
…但是请注意,在差异变得明显之前,您必须将相对较多的字符串放在一起。
#4
28
Python 3.6 changed the game for string concatenation of known components with Literal String Interpolation.
Python 3.6改变了用文字字符串插值连接已知组件的游戏。
Given the test case from mkoistinen's answer, having strings
考虑到mkoistinen的答案,有字符串。
domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
The contenders are
的竞争者是
-
f'http://{domain}/{lang}/{path}'
- 0.151 µsf 'http:/ / {域} / {朗} / {路径}’- 0.151µs
-
'http://%s/%s/%s' % (domain, lang, path)
- 0.321 µs“http://%s/%s/%s”%(朗域,路径)- 0.321µs
-
'http://' + domain + '/' + lang + '/' + path
- 0.356 µs“http://”+域名+‘/’+朗- 0.356µs +‘/’+路径
-
''.join(('http://', domain, '/', lang, '/', path))
- 0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant list).”。加入(域“http://”,“/”,朗,“/”,路径))- 0.249µs(注意建立一个长度恒定tuple是略高于建筑一个常数列表)。
Thus currently the shortest and the most beautiful code possible is also fastest.
因此,目前最短和最漂亮的代码也是最快的。
In alpha versions of Python 3.6 implementation for f''
strings was the slowest possible - actually the generated byte code is pretty much equivalent to the ''.join()
case with unnecessary calls to str.__format__
which without arguments would just return self
unchanged. These inefficiencies were addressed before 3.6 final.
在Python 3.6的alpha版本中,f”字符串的实现是最慢的——实际上,生成的字节代码与对string .__format__进行不必要的调用的“.join()情况非常相似,如果没有参数,它只会返回self不变。在3.6 final之前解决了这些低效问题。
The speed can be contrasted with the fastest method for Python 2, which is +
concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.
速度可以与Python 2的最快方法进行对比,即在我的计算机上进行+连接;需要0.203µs 8位字符串,和0.259µs如果字符串都是Unicode。
#5
11
As per John Fouhy's answer, don't optimize unless you have to, but if you're here and asking this question, it may be precisely because you have to. In my case, I needed assemble some URLs from string variables... fast. I noticed no one (so far) seems to be considering the string format method, so I thought I'd try that and, mostly for mild interest, I thought I'd toss the string interpolation operator in there for good measuer. To be honest, I didn't think either of these would stack up to a direct '+' operation or a ''.join(). But guess what? On my Python 2.7.5 system, the string interpolation operator rules them all and string.format() is the worst performer:
按照John Fouhy的回答,除非必要,否则不要优化,但如果你在这里问这个问题,可能正是因为你必须这样做。在我的例子中,我需要从字符串变量中收集一些url…快。我注意到没有人(到目前为止)似乎在考虑字符串格式的方法,所以我想我应该尝试一下,而且,主要是为了引起我的兴趣,我想我应该把字符串插值运算符扔到那里,以得到更好的度量。老实说,我认为这两种方法都不会直接加到“+”操作或“。join()”。但你猜怎么着?在我的Python 2.7.5系统中,字符串插补操作符将它们全部规则化,而string.format()是最差的:
# concatenate_test.py
from __future__ import print_function
import timeit
domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000
def meth_plus():
'''Using + operator'''
return 'http://' + domain + '/' + lang + '/' + path
def meth_join():
'''Using ''.join()'''
return ''.join(['http://', domain, '/', lang, '/', path])
def meth_form():
'''Using string.format'''
return 'http://{0}/{1}/{2}'.format(domain, lang, path)
def meth_intp():
'''Using string interpolation'''
return 'http://%s/%s/%s' % (domain, lang, path)
plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")
plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)
min_val = min([plus.val, join.val, form.val, intp.val])
print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))
The results:
结果:
# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)
If I use a shorter domain and shorter path, interpolation still wins out. The difference is more pronounced, though, with longer strings.
如果我使用更短的定义域和更短的路径,插值仍然占优。不过,长弦的区别更明显。
Now that I had a nice test script, I also tested under Python 2.6, 3.3 and 3.4, here's the results. In Python 2.6, the plus operator is the fastest! On Python 3, join wins out. Note: these tests are very repeatable on my system. So, 'plus' is always faster on 2.6, 'intp' is always faster on 2.7 and 'join' is always faster on Python 3.x.
现在我有了一个不错的测试脚本,我还在Python 2.6、3.3和3.4下进行了测试,结果如下。在Python 2.6中,+运算符是最快的!在Python 3中,join胜出。注意:这些测试在我的系统中是可重复的。因此,“+”总是在2.6中更快,“intp”总是在2.7上更快,“join”在Python 3.x上总是更快。
# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)
# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)
# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)
# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)
Lesson learned:
教训:
- Sometimes, my assumptions are dead wrong.
- 有时,我的假设是完全错误的。
- Test against the system env. you'll be running in production.
- 对系统环境进行测试。您将在生产中运行。
- String interpolation isn't dead yet!
- 字符串插补还没有死!
tl;dr:
tl;博士:
- If you using 2.6, use the + operator.
- 如果使用2.6,请使用+运算符。
- if you're using 2.7 use the '%' operator.
- 如果使用的是2.7,请使用“%”运算符。
- if you're using 3.x use ''.join().
- 如果您使用的是3。使用“x . join()。
#6
7
this url has the comparisons of the different approaches along with some benchmarking:
这个url比较了不同的方法以及一些基准:
http://skymind.com/~ocrow/python_string/
http://skymind.com/ ocrow / python_string /
Please Note: This is a very old comparison from pre-2009 based on Python 2.2, and so should, in most cases be disregarded.
#7
4
it pretty much depends on the relative sizes of the new string after every new concatenation. With the +
operator, for every concatenation a new string is made. If the intermediary strings are relatively long, the +
becomes increasingly slower because the new intermediary string is being stored.
这在很大程度上取决于每次连接后新字符串的相对大小。使用+运算符,对于每一个连接,都会生成一个新的字符串。如果中介字符串相对较长,则由于新的中介字符串被存储,所以+变得越来越慢。
Consider this case:
考虑这种情况:
from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
stri=stri+a+repr(i)
print time()-t
#case 2
t=time()
for i in xrange(1000):
l.append(a+repr(i))
z=''.join(l)
print time()-t
#case 3
t=time()
for i in range(1000):
stri=stri+repr(i)
print time()-t
#case 4
t=time()
for i in xrange(1000):
l.append(repr(i))
z=''.join(l)
print time()-t
Results
结果
1 0.00493192672729
1 0.00493192672729
2 0.000509023666382
2 0.000509023666382
3 0.00042200088501
3 0.00042200088501
4 0.000482797622681
4 0.000482797622681
In the case of 1&2, we add a large string, and join() performs about 10 times faster. In case 3&4, we add a small string, and '+' performs slightly faster
在1和2的情况下,我们添加一个大字符串,join()执行速度大约快10倍。在情况3和4中,我们添加一个小字符串,'+'执行得稍微快一些
#8
2
I ran into a situation where I needed to have an appendable string of unknown size. These are the benchmark results (python 2.7.3):
我遇到了这样的情况:我需要一个可附加的、大小未知的字符串。以下是基准测试结果(python 2.7.3):
$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop
This seems to show that '+=' is the fastest. The results from the skymind link are a bit out of date.
这似乎表明'+='是最快的。skymind链接的结果有点过时。
(I realize that the second example is not complete, the final list would need to be joined. This does show, however, that simply preparing the list takes longer than the string concat.)
(我意识到第二个例子还不完整,需要加入最后的列表。但是,这确实表明,仅仅准备列表所需的时间要比字符串concat长。
#9
2
One Year later, let's test mkoistinen's answer with python 3.4.3:
一年后,让我们用python 3.4.3测试mkoistinen的答案:
- plus 0.963564149000 (95.83% as fast)
- 加上0.963564149000(快95.83%)
- join 0.923408469000 (100.00% as fast)
- 加入0.923408469000 (100.00%)
- form 1.501130934000 (61.51% as fast)
- 表格1.501130934000 (61.51%)
- intp 1.019677452000 (90.56% as fast)
- intp 1.019677452000 (90.56%)
Nothing changed. Join is still the fastest method. With intp being arguably the best choice in terms of readability you might want to use intp nevertheless.
没有什么改变。Join仍然是最快的方法。在可读性方面,intp可以说是最好的选择,尽管如此,您可能还是希望使用intp。
#10
0
Inspired by @JasonBaker's benchmarks, here's a simple one comparing 10 "abcdefghijklmnopqrstuvxyz"
strings, showing that .join()
is faster; even with this tiny increase in variables:
受@JasonBaker的基准测试的启发,这里有一个比较10个“abcdefghijklmnopqrstuvxyz”字符串的简单示例,显示.join()更快;即使变量有这么小的增加:
Catenation
>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385
Join
>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048
#11
0
For a small set of short strings (i.e. 2 or 3 strings of no more than a few characters), plus is still way faster. Using mkoistinen's wonderful script in Python 2 and 3:
对于一组短字符串(即不超过几个字符的2或3个字符串),加号仍然要快得多。使用mkoistinen在Python 2和3中的出色脚本:
plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)
So when your code is doing a huge number of separate small concatenations, plus is the preferred way if speed is crucial.
所以当你的代码做了大量的单独的小连接时,如果速度是关键的话,plus是首选的方式。
#12
0
Probably "new f-strings in Python 3.6" is the most efficient way of concatenating strings.
“Python 3.6中的新f-string”可能是连接字符串的最有效的方式。
Using %s
使用% s
>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797
Using .format
使用.format
>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869
Using f
使用f
>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215
Source: https://realpython.com/python-f-strings/
来源:https://realpython.com/python-f-strings/
#1
100
You may be interested in this: An optimization anecdote by Guido. Although it is worth remembering also that this is an old article and it predates the existence of things like ''.join
(although I guess string.joinfields
is more-or-less the same)
你可能会对圭多的优化轶事感兴趣。虽然这也是值得记住的,这是一篇古老的文章,它先于诸如“”之类的东西的存在。加入(虽然我猜是字符串。joinfield差不多是一样的)
On the strength of that, the array
module may be fastest if you can shoehorn your problem into it. But ''.join
is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand.
基于这一点,如果您能够将问题塞进数组模块,那么数组模块可能是最快的。但“。join可能足够快,并且具有惯用性的优点,因此对于其他python程序员来说更容易理解。
Finally, the golden rule of optimization: don't optimize unless you know you need to, and measure rather than guessing.
最后,优化的金科玉律:不要优化,除非你知道你需要,并测量而不是猜测。
You can measure different methods using the timeit
module. That can tell you which is fastest, instead of random strangers on the internet making guesses.
您可以使用timeit模块来度量不同的方法。它可以告诉你哪个是最快的,而不是在网上随机的陌生人猜测。
#2
53
''.join(sequenceofstrings)
is what usually works best -- simplest and fastest.
.join(sequenceofstring)通常是最有效的——最简单、最快。
#3
32
It depends on what you're doing.
这取决于你在做什么。
After Python 2.5, string concatenation with the + operator is pretty fast. If you're just concatenating a couple of values, using the + operator works best:
在Python 2.5之后,与+运算符的字符串连接非常快。如果你只是连接两个值,使用+运算符效果最好:
>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344
>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984
However, if you're putting together a string in a loop, you're better off using the list joining method:
然而,如果你把一个字符串放在一个循环中,你最好使用列表连接方法:
>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
... joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267
>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
... str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184
...but notice that you have to be putting together a relatively high number of strings before the difference becomes noticeable.
…但是请注意,在差异变得明显之前,您必须将相对较多的字符串放在一起。
#4
28
Python 3.6 changed the game for string concatenation of known components with Literal String Interpolation.
Python 3.6改变了用文字字符串插值连接已知组件的游戏。
Given the test case from mkoistinen's answer, having strings
考虑到mkoistinen的答案,有字符串。
domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
The contenders are
的竞争者是
-
f'http://{domain}/{lang}/{path}'
- 0.151 µsf 'http:/ / {域} / {朗} / {路径}’- 0.151µs
-
'http://%s/%s/%s' % (domain, lang, path)
- 0.321 µs“http://%s/%s/%s”%(朗域,路径)- 0.321µs
-
'http://' + domain + '/' + lang + '/' + path
- 0.356 µs“http://”+域名+‘/’+朗- 0.356µs +‘/’+路径
-
''.join(('http://', domain, '/', lang, '/', path))
- 0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant list).”。加入(域“http://”,“/”,朗,“/”,路径))- 0.249µs(注意建立一个长度恒定tuple是略高于建筑一个常数列表)。
Thus currently the shortest and the most beautiful code possible is also fastest.
因此,目前最短和最漂亮的代码也是最快的。
In alpha versions of Python 3.6 implementation for f''
strings was the slowest possible - actually the generated byte code is pretty much equivalent to the ''.join()
case with unnecessary calls to str.__format__
which without arguments would just return self
unchanged. These inefficiencies were addressed before 3.6 final.
在Python 3.6的alpha版本中,f”字符串的实现是最慢的——实际上,生成的字节代码与对string .__format__进行不必要的调用的“.join()情况非常相似,如果没有参数,它只会返回self不变。在3.6 final之前解决了这些低效问题。
The speed can be contrasted with the fastest method for Python 2, which is +
concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.
速度可以与Python 2的最快方法进行对比,即在我的计算机上进行+连接;需要0.203µs 8位字符串,和0.259µs如果字符串都是Unicode。
#5
11
As per John Fouhy's answer, don't optimize unless you have to, but if you're here and asking this question, it may be precisely because you have to. In my case, I needed assemble some URLs from string variables... fast. I noticed no one (so far) seems to be considering the string format method, so I thought I'd try that and, mostly for mild interest, I thought I'd toss the string interpolation operator in there for good measuer. To be honest, I didn't think either of these would stack up to a direct '+' operation or a ''.join(). But guess what? On my Python 2.7.5 system, the string interpolation operator rules them all and string.format() is the worst performer:
按照John Fouhy的回答,除非必要,否则不要优化,但如果你在这里问这个问题,可能正是因为你必须这样做。在我的例子中,我需要从字符串变量中收集一些url…快。我注意到没有人(到目前为止)似乎在考虑字符串格式的方法,所以我想我应该尝试一下,而且,主要是为了引起我的兴趣,我想我应该把字符串插值运算符扔到那里,以得到更好的度量。老实说,我认为这两种方法都不会直接加到“+”操作或“。join()”。但你猜怎么着?在我的Python 2.7.5系统中,字符串插补操作符将它们全部规则化,而string.format()是最差的:
# concatenate_test.py
from __future__ import print_function
import timeit
domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000
def meth_plus():
'''Using + operator'''
return 'http://' + domain + '/' + lang + '/' + path
def meth_join():
'''Using ''.join()'''
return ''.join(['http://', domain, '/', lang, '/', path])
def meth_form():
'''Using string.format'''
return 'http://{0}/{1}/{2}'.format(domain, lang, path)
def meth_intp():
'''Using string interpolation'''
return 'http://%s/%s/%s' % (domain, lang, path)
plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")
plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)
min_val = min([plus.val, join.val, form.val, intp.val])
print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))
The results:
结果:
# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)
If I use a shorter domain and shorter path, interpolation still wins out. The difference is more pronounced, though, with longer strings.
如果我使用更短的定义域和更短的路径,插值仍然占优。不过,长弦的区别更明显。
Now that I had a nice test script, I also tested under Python 2.6, 3.3 and 3.4, here's the results. In Python 2.6, the plus operator is the fastest! On Python 3, join wins out. Note: these tests are very repeatable on my system. So, 'plus' is always faster on 2.6, 'intp' is always faster on 2.7 and 'join' is always faster on Python 3.x.
现在我有了一个不错的测试脚本,我还在Python 2.6、3.3和3.4下进行了测试,结果如下。在Python 2.6中,+运算符是最快的!在Python 3中,join胜出。注意:这些测试在我的系统中是可重复的。因此,“+”总是在2.6中更快,“intp”总是在2.7上更快,“join”在Python 3.x上总是更快。
# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)
# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)
# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)
# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)
Lesson learned:
教训:
- Sometimes, my assumptions are dead wrong.
- 有时,我的假设是完全错误的。
- Test against the system env. you'll be running in production.
- 对系统环境进行测试。您将在生产中运行。
- String interpolation isn't dead yet!
- 字符串插补还没有死!
tl;dr:
tl;博士:
- If you using 2.6, use the + operator.
- 如果使用2.6,请使用+运算符。
- if you're using 2.7 use the '%' operator.
- 如果使用的是2.7,请使用“%”运算符。
- if you're using 3.x use ''.join().
- 如果您使用的是3。使用“x . join()。
#6
7
this url has the comparisons of the different approaches along with some benchmarking:
这个url比较了不同的方法以及一些基准:
http://skymind.com/~ocrow/python_string/
http://skymind.com/ ocrow / python_string /
Please Note: This is a very old comparison from pre-2009 based on Python 2.2, and so should, in most cases be disregarded.
#7
4
it pretty much depends on the relative sizes of the new string after every new concatenation. With the +
operator, for every concatenation a new string is made. If the intermediary strings are relatively long, the +
becomes increasingly slower because the new intermediary string is being stored.
这在很大程度上取决于每次连接后新字符串的相对大小。使用+运算符,对于每一个连接,都会生成一个新的字符串。如果中介字符串相对较长,则由于新的中介字符串被存储,所以+变得越来越慢。
Consider this case:
考虑这种情况:
from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
stri=stri+a+repr(i)
print time()-t
#case 2
t=time()
for i in xrange(1000):
l.append(a+repr(i))
z=''.join(l)
print time()-t
#case 3
t=time()
for i in range(1000):
stri=stri+repr(i)
print time()-t
#case 4
t=time()
for i in xrange(1000):
l.append(repr(i))
z=''.join(l)
print time()-t
Results
结果
1 0.00493192672729
1 0.00493192672729
2 0.000509023666382
2 0.000509023666382
3 0.00042200088501
3 0.00042200088501
4 0.000482797622681
4 0.000482797622681
In the case of 1&2, we add a large string, and join() performs about 10 times faster. In case 3&4, we add a small string, and '+' performs slightly faster
在1和2的情况下,我们添加一个大字符串,join()执行速度大约快10倍。在情况3和4中,我们添加一个小字符串,'+'执行得稍微快一些
#8
2
I ran into a situation where I needed to have an appendable string of unknown size. These are the benchmark results (python 2.7.3):
我遇到了这样的情况:我需要一个可附加的、大小未知的字符串。以下是基准测试结果(python 2.7.3):
$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop
This seems to show that '+=' is the fastest. The results from the skymind link are a bit out of date.
这似乎表明'+='是最快的。skymind链接的结果有点过时。
(I realize that the second example is not complete, the final list would need to be joined. This does show, however, that simply preparing the list takes longer than the string concat.)
(我意识到第二个例子还不完整,需要加入最后的列表。但是,这确实表明,仅仅准备列表所需的时间要比字符串concat长。
#9
2
One Year later, let's test mkoistinen's answer with python 3.4.3:
一年后,让我们用python 3.4.3测试mkoistinen的答案:
- plus 0.963564149000 (95.83% as fast)
- 加上0.963564149000(快95.83%)
- join 0.923408469000 (100.00% as fast)
- 加入0.923408469000 (100.00%)
- form 1.501130934000 (61.51% as fast)
- 表格1.501130934000 (61.51%)
- intp 1.019677452000 (90.56% as fast)
- intp 1.019677452000 (90.56%)
Nothing changed. Join is still the fastest method. With intp being arguably the best choice in terms of readability you might want to use intp nevertheless.
没有什么改变。Join仍然是最快的方法。在可读性方面,intp可以说是最好的选择,尽管如此,您可能还是希望使用intp。
#10
0
Inspired by @JasonBaker's benchmarks, here's a simple one comparing 10 "abcdefghijklmnopqrstuvxyz"
strings, showing that .join()
is faster; even with this tiny increase in variables:
受@JasonBaker的基准测试的启发,这里有一个比较10个“abcdefghijklmnopqrstuvxyz”字符串的简单示例,显示.join()更快;即使变量有这么小的增加:
Catenation
>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385
Join
>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048
#11
0
For a small set of short strings (i.e. 2 or 3 strings of no more than a few characters), plus is still way faster. Using mkoistinen's wonderful script in Python 2 and 3:
对于一组短字符串(即不超过几个字符的2或3个字符串),加号仍然要快得多。使用mkoistinen在Python 2和3中的出色脚本:
plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)
So when your code is doing a huge number of separate small concatenations, plus is the preferred way if speed is crucial.
所以当你的代码做了大量的单独的小连接时,如果速度是关键的话,plus是首选的方式。
#12
0
Probably "new f-strings in Python 3.6" is the most efficient way of concatenating strings.
“Python 3.6中的新f-string”可能是连接字符串的最有效的方式。
Using %s
使用% s
>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797
Using .format
使用.format
>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869
Using f
使用f
>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215
Source: https://realpython.com/python-f-strings/
来源:https://realpython.com/python-f-strings/