I have a generator for a large set of items. I want to iterate through them once, outputting them to a file. However, with the file format I currently have, I first have to output the number of items I have. I don't want to build a list of the items in memory, as there are too many of them and that would take a lot of time and memory. Is there a way to iterate through the generator, getting its length, but somehow be able to iterate through it again later, getting the same items?
我有一套用于大量物品的发电机。我想迭代它们一次,将它们输出到一个文件。但是,对于我目前拥有的文件格式,我首先必须输出我拥有的项目数。我不想在内存中构建项目列表,因为它们太多而且需要花费大量的时间和内存。有没有办法迭代生成器,获得它的长度,但不知何故能够再次迭代它,获得相同的项目?
If not, what other solution could I come up with for this problem?
如果没有,我可以为这个问题提出什么其他解决方案?
3 个解决方案
#1
5
If you can figure out how to just write a formula to calculate the size based on the parameters that control the generator, do that. Otherwise, I don't think you would save much time.
如果你能弄清楚如何根据控制发生器的参数编写一个公式来计算大小,那就这样做吧。否则,我认为你不会节省太多时间。
Include the generator here, and we'll try to do it for you!
在这里包括发电机,我们会尽力为您服务!
#2
5
This cannot be done. Once a generator is exhausted it needs to be reconstructed in order to be used again. It is possible to define the __len__()
method on an iterator object if the number of items is known ahead of time, and then len()
can be called against the iterator object.
这是不可能做到的。一旦发电机耗尽,就需要重建它以便再次使用。如果事先知道项目数,则可以在迭代器对象上定义__len __()方法,然后可以针对迭代器对象调用len()。
#3
5
I don't think that is possible for any generalized iterator. You will need to figure out how the generator was originally constructed and then regenerate it for the final pass.
我认为任何通用迭代器都不可能。您将需要弄清楚如何最初构建生成器,然后为最终传递重新生成它。
Alternatively, you could write out a dummy size to your file, write the items, and then reopen the file for modification and correct the size in the header.
或者,您可以在文件中写出虚拟大小,编写项目,然后重新打开文件进行修改并更正标题中的大小。
If your file is a binary format, this could work quite well, since the number of bytes for the size is the same regardless of what the actual size is. If it is a text format, it is possible that you would have to add some extra length to the file if you weren't able to pad the dummy size to cover all cases. See this question for a discussion on inserting and rewriting in a text file using Python.
如果您的文件是二进制格式,这可以很好地工作,因为无论实际大小是多少,大小的字节数都是相同的。如果是文本格式,如果您无法填充虚拟大小以覆盖所有情况,则可能需要为文件添加一些额外长度。有关使用Python在文本文件中插入和重写的讨论,请参阅此问题。
#1
5
If you can figure out how to just write a formula to calculate the size based on the parameters that control the generator, do that. Otherwise, I don't think you would save much time.
如果你能弄清楚如何根据控制发生器的参数编写一个公式来计算大小,那就这样做吧。否则,我认为你不会节省太多时间。
Include the generator here, and we'll try to do it for you!
在这里包括发电机,我们会尽力为您服务!
#2
5
This cannot be done. Once a generator is exhausted it needs to be reconstructed in order to be used again. It is possible to define the __len__()
method on an iterator object if the number of items is known ahead of time, and then len()
can be called against the iterator object.
这是不可能做到的。一旦发电机耗尽,就需要重建它以便再次使用。如果事先知道项目数,则可以在迭代器对象上定义__len __()方法,然后可以针对迭代器对象调用len()。
#3
5
I don't think that is possible for any generalized iterator. You will need to figure out how the generator was originally constructed and then regenerate it for the final pass.
我认为任何通用迭代器都不可能。您将需要弄清楚如何最初构建生成器,然后为最终传递重新生成它。
Alternatively, you could write out a dummy size to your file, write the items, and then reopen the file for modification and correct the size in the header.
或者,您可以在文件中写出虚拟大小,编写项目,然后重新打开文件进行修改并更正标题中的大小。
If your file is a binary format, this could work quite well, since the number of bytes for the size is the same regardless of what the actual size is. If it is a text format, it is possible that you would have to add some extra length to the file if you weren't able to pad the dummy size to cover all cases. See this question for a discussion on inserting and rewriting in a text file using Python.
如果您的文件是二进制格式,这可以很好地工作,因为无论实际大小是多少,大小的字节数都是相同的。如果是文本格式,如果您无法填充虚拟大小以覆盖所有情况,则可能需要为文件添加一些额外长度。有关使用Python在文本文件中插入和重写的讨论,请参阅此问题。