如何清除stringio对象?

时间:2022-02-12 19:18:12

I have a stringio object created and it has some text in it. I'd like to clear its existing values and reuse it instead of recalling it. Is there anyway of doing this?

我创建了一个stringio对象,其中有一些文本。我希望清除它的现有值并重用它,而不是重新调用它。有必要这么做吗?

3 个解决方案

#1


68  

TL;DR

Don't bother clearing it, just create a new one—it’s faster.

不要费心去清理它,只要创建一个新的——它会更快。

The method

Python 2

Here's how I would find such things out:

以下是我发现这些事情的方法:

>>> from StringIO import StringIO
>>> dir(StringIO)
['__doc__', '__init__', '__iter__', '__module__', 'close', 'flush', 'getvalue', 'isatty', 'next', 'read', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method truncate in module StringIO:

truncate(self, size=None) unbound StringIO.StringIO method
    Truncate the file's size.

    If the optional size argument is present, the file is truncated to
    (at most) that size. The size defaults to the current position.
    The current file position is not changed unless the position
    is beyond the new file size.

    If the specified size exceeds the file's current size, the
    file remains unchanged.

So, you want .truncate(0). But it's probably cheaper (and easier) to initialise a new StringIO. See below for benchmarks.

所以,你想要.truncate(0)。但推出一个新的StringIO可能更便宜(也更容易)。见下文为基准。

Python 3

(Thanks to tstone2077 for pointing out the difference.)

(感谢tstone2077指出了其中的差异。)

>>> from io import StringIO
>>> dir(StringIO)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'getvalue', 'isatty', 'line_buffering', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method_descriptor:

truncate(...)
    Truncate size to pos.

    The pos argument defaults to the current file position, as
    returned by tell().  The current file position is unchanged.
    Returns the new absolute position.

It is important to note with this that now the current file position is unchanged, whereas truncating to size zero would reset the position in the Python 2 variant.

需要注意的是,现在的文件位置没有变化,而截断到0大小将重置Python 2变体中的位置。

Thus, for Python 2, you only need

因此,对于Python 2,您只需要。

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
>>> s.getvalue()
'foo'
>>> s.truncate(0)
>>> s.getvalue()
''
>>> s.write('bar')
>>> s.getvalue()
'bar'

If you do this in Python 3, you won't get the result you expected:

如果您在Python 3中这样做,您将不会得到预期的结果:

>>> from io import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'\x00\x00\x00bar'

So in Python 3 you also need to reset the position:

所以在Python 3中,你还需要重置位置:

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.seek(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'bar'

If using the truncate method in Python 2 code, it's safer to call seek(0) at the same time (before or after, it doesn't matter) so that the code won't break when you inevitably port it to Python 3. And there's another reason why you should just create a new StringIO object!

如果在Python 2代码中使用truncatetable方法,那么在同时(在之前或之后)调用seek(0)更安全,这样当您不可避免地将代码移植到Python 3时,代码就不会中断。还有另外一个原因,你应该创建一个新的StringIO对象!

Times

Python 2

>>> from timeit import timeit
>>> def truncate(sio):
...     sio.truncate(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
... 

When empty, with StringIO:

空,StringIO时:

>>> from StringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
3.5194039344787598
>>> timeit(lambda: new(StringIO()))
3.6533868312835693

With 3KB of data in, with StringIO:

有3KB的数据输入,使用StringIO:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
4.3437709808349609
>>> timeit(lambda: new(StringIO('abc' * 1000)))
4.7179079055786133

And the same with cStringIO:

cStringIO也是如此

>>> from cStringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.55461597442626953
>>> timeit(lambda: new(StringIO()))
0.51241087913513184
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
1.0958449840545654
>>> timeit(lambda: new(StringIO('abc' * 1000)))
0.98760509490966797

So, ignoring potential memory concerns (del oldstringio), it's faster to truncate a StringIO.StringIO (3% faster for empty, 8% faster for 3KB of data), but it's faster ("fasterer" too) to create a new cStringIO.StringIO (8% faster for empty, 10% faster for 3KB of data). So I'd recommend just using the easiest one—so presuming you're working with CPython, use cStringIO and create new ones.

因此,忽略潜在的内存问题(del oldstringio),截断一个StringIO会更快。StringIO(空的速度快3%,3KB数据的速度快8%),但是创建一个新的cStringIO速度更快。StringIO(空时快8%,3KB数据时快10%)。因此,我建议使用最简单的方法——假设您正在使用CPython,使用cStringIO并创建新的CPython。

Python 3

The same code, just with seek(0) put in.

同样的代码,只要输入seek(0)。

>>> def truncate(sio):
...     sio.truncate(0)
...     sio.seek(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
...

When empty:

当空:

>>> from io import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.9706327870007954
>>> timeit(lambda: new(StringIO()))
0.8734330690022034

With 3KB of data in:

包含3KB数据:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
3.5271066290006274
>>> timeit(lambda: new(StringIO('abc' * 1000)))
3.3496507499985455

So for Python 3 creating a new one instead of reusing a blank one is 11% faster and creating a new one instead of reusing a 3K one is 5% faster. Again, create a new StringIO rather than truncating and seeking.

所以对于Python 3来说,创建一个新的而不是重复使用一个空白的比创建一个新的快11%而不是重复使用一个3K的快5%再一次,创建一个新的StringIO而不是截断和寻找。

#2


7  

There is something important to note (at least with Python 3.2):

有一些重要的事情需要注意(至少在Python 3.2中):

seek(0) IS needed before truncate(0). Here is some code without the seek(0):

在截断(0)之前需要查找(0)。这里有一些没有seek(0)的代码:

from io import StringIO
s = StringIO()
s.write('1'*3)
print(repr(s.getvalue()))
s.truncate(0)
print(repr(s.getvalue()))
s.write('1'*3)
print(repr(s.getvalue()))

Which outputs:

输出:

'111'
''
'\x00\x00\x00111'

with seek(0) before the truncate, we get the expected output:

对于截断前的seek(0),得到期望输出:

'111'
''
'111'

#3


2  

How I managed to optimise my processing (read in chunks, process each chunk, write processed stream out to file) of many files in a sequence is that I reuse the same cStringIO.StringIO instance, but always reset() it after using, then write to it, and then truncate(). By doing this, I'm only truncating the part at the end that I don't need for the current file. This seems to have given me a ~3% performance increase. Anybody who's more expert on this could confirm if this indeed optimises memory allocation.

我如何在一个序列中优化对许多文件的处理(读取块、处理每个块、将处理后的流写到文件中),我重用了相同的cStringIO。StringIO实例,但使用后总是重置(),然后写入它,然后截断()。通过这样做,我只是在末尾截断了当前文件不需要的部分。这似乎使我的性能提高了大约3%。任何对此更精通的人都可以确认这是否确实优化了内存分配。

sio = cStringIO.StringIO()
for file in files:
    read_file_chunks_and_write_to_sio(file, sio)
    sio.truncate()
    with open('out.bla', 'w') as f:
        f.write(sio.getvalue())
    sio.reset()

#1


68  

TL;DR

Don't bother clearing it, just create a new one—it’s faster.

不要费心去清理它,只要创建一个新的——它会更快。

The method

Python 2

Here's how I would find such things out:

以下是我发现这些事情的方法:

>>> from StringIO import StringIO
>>> dir(StringIO)
['__doc__', '__init__', '__iter__', '__module__', 'close', 'flush', 'getvalue', 'isatty', 'next', 'read', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method truncate in module StringIO:

truncate(self, size=None) unbound StringIO.StringIO method
    Truncate the file's size.

    If the optional size argument is present, the file is truncated to
    (at most) that size. The size defaults to the current position.
    The current file position is not changed unless the position
    is beyond the new file size.

    If the specified size exceeds the file's current size, the
    file remains unchanged.

So, you want .truncate(0). But it's probably cheaper (and easier) to initialise a new StringIO. See below for benchmarks.

所以,你想要.truncate(0)。但推出一个新的StringIO可能更便宜(也更容易)。见下文为基准。

Python 3

(Thanks to tstone2077 for pointing out the difference.)

(感谢tstone2077指出了其中的差异。)

>>> from io import StringIO
>>> dir(StringIO)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'getvalue', 'isatty', 'line_buffering', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method_descriptor:

truncate(...)
    Truncate size to pos.

    The pos argument defaults to the current file position, as
    returned by tell().  The current file position is unchanged.
    Returns the new absolute position.

It is important to note with this that now the current file position is unchanged, whereas truncating to size zero would reset the position in the Python 2 variant.

需要注意的是,现在的文件位置没有变化,而截断到0大小将重置Python 2变体中的位置。

Thus, for Python 2, you only need

因此,对于Python 2,您只需要。

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
>>> s.getvalue()
'foo'
>>> s.truncate(0)
>>> s.getvalue()
''
>>> s.write('bar')
>>> s.getvalue()
'bar'

If you do this in Python 3, you won't get the result you expected:

如果您在Python 3中这样做,您将不会得到预期的结果:

>>> from io import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'\x00\x00\x00bar'

So in Python 3 you also need to reset the position:

所以在Python 3中,你还需要重置位置:

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.seek(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'bar'

If using the truncate method in Python 2 code, it's safer to call seek(0) at the same time (before or after, it doesn't matter) so that the code won't break when you inevitably port it to Python 3. And there's another reason why you should just create a new StringIO object!

如果在Python 2代码中使用truncatetable方法,那么在同时(在之前或之后)调用seek(0)更安全,这样当您不可避免地将代码移植到Python 3时,代码就不会中断。还有另外一个原因,你应该创建一个新的StringIO对象!

Times

Python 2

>>> from timeit import timeit
>>> def truncate(sio):
...     sio.truncate(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
... 

When empty, with StringIO:

空,StringIO时:

>>> from StringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
3.5194039344787598
>>> timeit(lambda: new(StringIO()))
3.6533868312835693

With 3KB of data in, with StringIO:

有3KB的数据输入,使用StringIO:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
4.3437709808349609
>>> timeit(lambda: new(StringIO('abc' * 1000)))
4.7179079055786133

And the same with cStringIO:

cStringIO也是如此

>>> from cStringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.55461597442626953
>>> timeit(lambda: new(StringIO()))
0.51241087913513184
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
1.0958449840545654
>>> timeit(lambda: new(StringIO('abc' * 1000)))
0.98760509490966797

So, ignoring potential memory concerns (del oldstringio), it's faster to truncate a StringIO.StringIO (3% faster for empty, 8% faster for 3KB of data), but it's faster ("fasterer" too) to create a new cStringIO.StringIO (8% faster for empty, 10% faster for 3KB of data). So I'd recommend just using the easiest one—so presuming you're working with CPython, use cStringIO and create new ones.

因此,忽略潜在的内存问题(del oldstringio),截断一个StringIO会更快。StringIO(空的速度快3%,3KB数据的速度快8%),但是创建一个新的cStringIO速度更快。StringIO(空时快8%,3KB数据时快10%)。因此,我建议使用最简单的方法——假设您正在使用CPython,使用cStringIO并创建新的CPython。

Python 3

The same code, just with seek(0) put in.

同样的代码,只要输入seek(0)。

>>> def truncate(sio):
...     sio.truncate(0)
...     sio.seek(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
...

When empty:

当空:

>>> from io import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.9706327870007954
>>> timeit(lambda: new(StringIO()))
0.8734330690022034

With 3KB of data in:

包含3KB数据:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
3.5271066290006274
>>> timeit(lambda: new(StringIO('abc' * 1000)))
3.3496507499985455

So for Python 3 creating a new one instead of reusing a blank one is 11% faster and creating a new one instead of reusing a 3K one is 5% faster. Again, create a new StringIO rather than truncating and seeking.

所以对于Python 3来说,创建一个新的而不是重复使用一个空白的比创建一个新的快11%而不是重复使用一个3K的快5%再一次,创建一个新的StringIO而不是截断和寻找。

#2


7  

There is something important to note (at least with Python 3.2):

有一些重要的事情需要注意(至少在Python 3.2中):

seek(0) IS needed before truncate(0). Here is some code without the seek(0):

在截断(0)之前需要查找(0)。这里有一些没有seek(0)的代码:

from io import StringIO
s = StringIO()
s.write('1'*3)
print(repr(s.getvalue()))
s.truncate(0)
print(repr(s.getvalue()))
s.write('1'*3)
print(repr(s.getvalue()))

Which outputs:

输出:

'111'
''
'\x00\x00\x00111'

with seek(0) before the truncate, we get the expected output:

对于截断前的seek(0),得到期望输出:

'111'
''
'111'

#3


2  

How I managed to optimise my processing (read in chunks, process each chunk, write processed stream out to file) of many files in a sequence is that I reuse the same cStringIO.StringIO instance, but always reset() it after using, then write to it, and then truncate(). By doing this, I'm only truncating the part at the end that I don't need for the current file. This seems to have given me a ~3% performance increase. Anybody who's more expert on this could confirm if this indeed optimises memory allocation.

我如何在一个序列中优化对许多文件的处理(读取块、处理每个块、将处理后的流写到文件中),我重用了相同的cStringIO。StringIO实例,但使用后总是重置(),然后写入它,然后截断()。通过这样做,我只是在末尾截断了当前文件不需要的部分。这似乎使我的性能提高了大约3%。任何对此更精通的人都可以确认这是否确实优化了内存分配。

sio = cStringIO.StringIO()
for file in files:
    read_file_chunks_and_write_to_sio(file, sio)
    sio.truncate()
    with open('out.bla', 'w') as f:
        f.write(sio.getvalue())
    sio.reset()