I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don't have control of the input, or I'd have it passed in as a list of four-element tuples. Currently, I'm iterating over it this way:
我有一个Python脚本,它接受一个整数列表作为输入,我需要每次处理四个整数。不幸的是,我无法控制输入,或者我将它作为四个元素元组的列表传入。目前,我正以这种方式进行迭代:
for i in xrange(0, len(ints), 4):
# dummy op for example code
foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]
It looks a lot like "C-think", though, which makes me suspect there's a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn't be preserved. Perhaps something like this would be better?
不过,它看起来很像“C-think”,这让我怀疑有一种更自然的方式来处理这种情况。该列表在迭代后被丢弃,因此不必保留。也许像这样的东西会更好?
while ints:
foo += ints[0] * ints[1] + ints[2] * ints[3]
ints[0:4] = []
Still doesn't quite "feel" right, though. :-/
不过,“感觉”还是不太对。:- /
Related question: How do you split a list into evenly sized chunks in Python?
相关问题:如何在Python中将列表分割成大小相同的块?
32 个解决方案
#1
237
Modified from the recipes section of Python's itertools docs:
修改了Python的itertools文档的recipe部分:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return izip_longest(*args, fillvalue=fillvalue)
Example
In pseudocode to keep the example terse.
在伪代码中保持示例简洁。
grouper('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'
Note: izip_longest
is new to Python 2.6. In Python 3 use zip_longest
.
注意:izip_longest是Python 2.6的新特性。在Python 3中使用zip_longest。
#2
310
def chunker(seq, size):
return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))
Simple. Easy. Fast. Works with any sequence:
简单。一件容易的事。快。适用于任何序列:
text = "I am a very, very helpful text"
for group in chunker(text, 7):
print repr(group),
# 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt'
print '|'.join(chunker(text, 10))
# I am a ver|y, very he|lpful text
animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']
for group in chunker(animals, 3):
print group
# ['cat', 'dog', 'rabbit']
# ['duck', 'bird', 'cow']
# ['gnu', 'fish']
#3
87
I'm a fan of
我的粉丝
chunkSize= 4
for i in xrange(0, len(ints), chunkSize):
chunk = ints[i:i+chunkSize]
# process chunk of size <= chunkSize
#4
20
import itertools
def chunks(iterable,size):
it = iter(iterable)
chunk = tuple(itertools.islice(it,size))
while chunk:
yield chunk
chunk = tuple(itertools.islice(it,size))
# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
foo += x1 + x2 + x3 + x4
for chunk in chunks(ints,4):
foo += sum(chunk)
Another way:
另一种方法:
import itertools
def chunks2(iterable,size,filler=None):
it = itertools.chain(iterable,itertools.repeat(filler,size-1))
chunk = tuple(itertools.islice(it,size))
while len(chunk) == size:
yield chunk
chunk = tuple(itertools.islice(it,size))
# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
foo += x1 + x2 + x3 + x4
#5
11
from itertools import izip_longest
def chunker(iterable, chunksize, filler):
return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)
#6
8
I needed a solution that would also work with sets and generators. I couldn't come up with anything very short and pretty, but it's quite readable at least.
我需要一个同样适用于集合和生成器的解决方案。我想不出什么又短又漂亮的东西,但至少它可读性很强。
def chunker(seq, size):
res = []
for el in seq:
res.append(el)
if len(res) == size:
yield res
res = []
if res:
yield res
List:
列表:
>>> list(chunker([i for i in range(10)], 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Set:
设置:
>>> list(chunker(set([i for i in range(10)]), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Generator:
发电机:
>>> list(chunker((i for i in range(10)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
#7
6
Posting this as an answer since I cannot comment...
因为我不能评论,所以把这贴出来作为回答。
Using map() instead of zip() fixes the padding issue in J.F. Sebastian's answer:
使用map()而不是zip()修复了J.F. Sebastian的填充问题:
>>> def chunker(iterable, chunksize):
... return map(None,*[iter(iterable)]*chunksize)
Example:
例子:
>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'), ('0', None, None)]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8'), ('9', '0', None, None)]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]
#8
5
Similar to other proposals, but not exactly identical, I like doing it this way, because it's simple and easy to read:
与其他提案类似,但并非完全相同,我喜欢这样做,因为它简单易懂:
it = iter([1, 2, 3, 4, 5, 6, 7, 8, 9])
for chunk in zip(it, it, it, it):
print chunk
>>> (1, 2, 3, 4)
>>> (5, 6, 7, 8)
This way you won't get the last partial chunk. If you want to get (9, None, None, None)
as last chunk, just use izip_longest
from itertools
.
这样你就不会得到最后的部分。如果您想将(9,None, None, None)作为最后一个块,只需使用itertools中的izip_long。
#9
4
Since nobody's mentioned it yet here's a zip()
solution:
由于还没有人提到它,这里有一个zip()解决方案:
>>> def chunker(iterable, chunksize):
... return zip(*[iter(iterable)]*chunksize)
It works only if your sequence's length is always divisible by the chunk size or you don't care about a trailing chunk if it isn't.
只有当序列的长度总是可以被块大小整除,或者你不关心拖尾的块时,它才能工作。
Example:
例子:
>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8')]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]
Or using itertools.izip to return an iterator instead of a list:
出现或使用itertools。izip返回迭代器而不是列表:
>>> from itertools import izip
>>> def chunker(iterable, chunksize):
... return izip(*[iter(iterable)]*chunksize)
Padding can be fixed using @ΤΖΩΤΖΙΟΥ's answer:
填充可以使用@ΤΖΩΤΖΙΟΥ固定的回答是:
>>> from itertools import chain, izip, repeat
>>> def chunker(iterable, chunksize, fillvalue=None):
... it = chain(iterable, repeat(fillvalue, chunksize-1))
... args = [it] * chunksize
... return izip(*args)
#10
4
The ideal solution for this problem works with iterators (not just sequences). It should also be fast.
这个问题的理想解决方案是使用迭代器(而不仅仅是序列)。它也应该很快。
This is the solution provided by the documentation for itertools:
这是itertools文档提供的解决方案:
def grouper(n, iterable, fillvalue=None):
#"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
Using ipython's %timeit
on my mac book air, I get 47.5 us per loop.
在我的mac book air上使用ipython的%timeit,我每循环得到47.5个us。
However, this really doesn't work for me since the results are padded to be even sized groups. A solution without the padding is slightly more complicated. The most naive solution might be:
然而,这对我来说并不适用,因为结果被添加到更大的组中。没有填充的解决方案稍微复杂一些。最天真的解决办法可能是:
def grouper(size, iterable):
i = iter(iterable)
while True:
out = []
try:
for _ in range(size):
out.append(i.next())
except StopIteration:
yield out
break
yield out
Simple, but pretty slow: 693 us per loop
简单,但相当慢:693我们每个循环
The best solution I could come up with uses islice
for the inner loop:
我能想到的最好的解决方法是对内部循环使用islice:
def grouper(size, iterable):
it = iter(iterable)
while True:
group = tuple(itertools.islice(it, None, size))
if not group:
break
yield group
With the same dataset, I get 305 us per loop.
对于相同的数据集,我每循环得到305 us。
Unable to get a pure solution any faster than that, I provide the following solution with an important caveat: If your input data has instances of filldata
in it, you could get wrong answer.
由于无法获得比这更快的纯解决方案,我提供了一个重要的警告:如果您的输入数据中有filldata的实例,您可能会得到错误的答案。
def grouper(n, iterable, fillvalue=None):
#"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
for i in itertools.izip_longest(fillvalue=fillvalue, *args):
if tuple(i)[-1] == fillvalue:
yield tuple(v for v in i if v != fillvalue)
else:
yield i
I really don't like this answer, but it is significantly faster. 124 us per loop
我真的不喜欢这个答案,但它要快得多。124年美国每循环
#11
3
Using little functions and things really doesn't appeal to me; I prefer to just use slices:
使用小的函数和东西真的不吸引我;我更喜欢用切片:
data = [...]
chunk_size = 10000 # or whatever
chunks = [data[i:i+chunk_size] for i in xrange(0,len(data),chunk_size)]
for chunk in chunks:
...
#12
2
If the list is large, the highest-performing way to do this will be to use a generator:
如果列表很大,最有效的方法是使用生成器:
def get_chunk(iterable, chunk_size):
result = []
for item in iterable:
result.append(item)
if len(result) == chunk_size:
yield tuple(result)
result = []
if len(result) > 0:
yield tuple(result)
for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3):
print x
(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10,)
#13
2
Another approach would be to use the two-argument form of iter
:
另一种方法是使用iter的两参数形式:
from itertools import islice
def group(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
This can be adapted easily to use padding (this is similar to Markus Jarderot’s answer):
这可以很容易地适应使用填充(这类似于Markus Jarderot的答案):
from itertools import islice, chain, repeat
def group_pad(it, size, pad=None):
it = chain(iter(it), repeat(pad))
return iter(lambda: tuple(islice(it, size)), (pad,) * size)
These can even be combined for optional padding:
这些甚至可以用于可选填充:
_no_pad = object()
def group(it, size, pad=_no_pad):
if pad == _no_pad:
it = iter(it)
sentinel = ()
else:
it = chain(iter(it), repeat(pad))
sentinel = (pad,) * size
return iter(lambda: tuple(islice(it, size)), sentinel)
#14
2
With NumPy it's simple:
与NumPy很简单:
ints = array([1, 2, 3, 4, 5, 6, 7, 8])
for int1, int2 in ints.reshape(-1, 2):
print(int1, int2)
output:
输出:
1 2
3 4
5 6
7 8
#15
2
If you don't mind using an external package you could use iteration_utilities.grouper
from iteration_utilties
1. It supports all iterables (not just sequences):
如果不介意使用外部包,可以使用iteration_utilities。石斑鱼从iteration_utilties 1。它支持所有的iterables(不只是序列):
from iteration_utilities import grouper
seq = list(range(20))
for group in grouper(seq, 4):
print(group)
which prints:
打印:
(0, 1, 2, 3)
(4, 5, 6, 7)
(8, 9, 10, 11)
(12, 13, 14, 15)
(16, 17, 18, 19)
In case the length isn't a multiple of the groupsize it also supports filling (the incomplete last group) or truncating (discarding the incomplete last group) the last one:
如果长度不是组大小的倍数,它还支持填充(不完整的最后一组)或截断(丢弃不完整的最后一组)最后一组:
from iteration_utilities import grouper
seq = list(range(17))
for group in grouper(seq, 4):
print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16,)
for group in grouper(seq, 4, fillvalue=None):
print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16, None, None, None)
for group in grouper(seq, 4, truncate=True):
print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
1 Disclaimer: I'm the author of that package.
免责声明:我是这个包裹的作者。
#16
1
In your second method, I would advance to the next group of 4 by doing this:
在你的第二种方法中,我将通过以下方法进入下一个4人组:
ints = ints[4:]
However, I haven't done any performance measurement so I don't know which one might be more efficient.
但是,我还没有做任何性能度量,所以我不知道哪一个可能更有效。
Having said that, I would usually choose the first method. It's not pretty, but that's often a consequence of interfacing with the outside world.
说到这里,我通常会选择第一种方法。这并不漂亮,但这通常是与外界接触的结果。
#17
1
Yet another answer, the advantages of which are:
另一个答案是:
1) Easily understandable
2) Works on any iterable, not just sequences (some of the above answers will choke on filehandles)
3) Does not load the chunk into memory all at once
4) Does not make a chunk-long list of references to the same iterator in memory
5) No padding of fill values at the end of the list
1)容易理解的2)适用于任何iterable,不仅序列(有些文件句柄上面的答案将被)3)不一次性加载到内存块4)不会使chunk-long列表引用相同的迭代器在内存中5)没有填充物填值列表的最后
That being said, I haven't timed it so it might be slower than some of the more clever methods, and some of the advantages may be irrelevant given the use case.
话虽如此,我还没有对它进行计时,所以它可能比一些更聪明的方法要慢,而且考虑到用例,有些优点可能是无关的。
def chunkiter(iterable, size):
def inneriter(first, iterator, size):
yield first
for _ in xrange(size - 1):
yield iterator.next()
it = iter(iterable)
while True:
yield inneriter(it.next(), it, size)
In [2]: i = chunkiter('abcdefgh', 3)
In [3]: for ii in i:
for c in ii:
print c,
print ''
...:
a b c
d e f
g h
Update:
A couple of drawbacks due to the fact the inner and outer loops are pulling values from the same iterator:
1) continue doesn't work as expected in the outer loop - it just continues on to the next item rather than skipping a chunk. However, this doesn't seem like a problem as there's nothing to test in the outer loop.
2) break doesn't work as expected in the inner loop - control will wind up in the inner loop again with the next item in the iterator. To skip whole chunks, either wrap the inner iterator (ii above) in a tuple, e.g. for c in tuple(ii)
, or set a flag and exhaust the iterator.
更新:由于内循环和外循环从同一个迭代器中提取值这一事实,导致了一些缺陷:1)continue不能像外循环中预期的那样工作——它只是继续到下一个条目,而不是跳过一个块。然而,这似乎不是一个问题,因为在外部循环中没有什么要测试的。2) break不像内部循环中预期的那样工作——控件将在迭代器中的下一项中再次在内部循环中结束。要跳过整个块,可以将内部迭代器(ii)封装在一个元组中,例如,对于tuple(ii)中的c,或者设置一个标志并耗尽迭代器。
#18
1
def group_by(iterable, size):
"""Group an iterable into lists that don't exceed the size given.
>>> group_by([1,2,3,4,5], 2)
[[1, 2], [3, 4], [5]]
"""
sublist = []
for index, item in enumerate(iterable):
if index > 0 and index % size == 0:
yield sublist
sublist = []
sublist.append(item)
if sublist:
yield sublist
#19
1
You can use partition or chunks function from funcy library:
您可以使用功能库中的分区或块函数:
from funcy import partition
for a, b, c, d in partition(4, ints):
foo += a * b * c * d
These functions also has iterator versions ipartition
and ichunks
, which will be more efficient in this case.
这些函数还有迭代器版本的ipartition和ichunk,在这种情况下会更有效。
You can also peek at their implementation.
您还可以查看它们的实现。
#20
1
To avoid all conversions to a list import itertools
and:
为了避免所有的转换到一个列表导入迭代工具和:
>>> for k, g in itertools.groupby(xrange(35), lambda x: x/10):
... list(g)
Produces:
生产:
...
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
2 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
3 [30, 31, 32, 33, 34]
>>>
I checked groupby
and it doesn't convert to list or use len
so I (think) this will delay resolution of each value until it is actually used. Sadly none of the available answers (at this time) seemed to offer this variation.
我检查了groupby,它没有转换为list或使用len,所以我(认为)这会延迟每个值的解析,直到它被实际使用。遗憾的是,(在这个时候)所有可用的答案似乎都没有提供这种变化。
Obviously if you need to handle each item in turn nest a for loop over g:
显然,如果你需要依次处理每一项,在g上嵌一个for循环:
for k,g in itertools.groupby(xrange(35), lambda x: x/10):
for i in g:
# do what you need to do with individual items
# now do what you need to do with the whole group
My specific interest in this was the need to consume a generator to submit changes in batches of up to 1000 to the gmail API:
我对此特别感兴趣的是需要使用一个生成器将最多1000个批次的更改提交给gmail API:
messages = a_generator_which_would_not_be_smart_as_a_list
for idx, batch in groupby(messages, lambda x: x/1000):
batch_request = BatchHttpRequest()
for message in batch:
batch_request.add(self.service.users().messages().modify(userId='me', id=message['id'], body=msg_labels))
http = httplib2.Http()
self.credentials.authorize(http)
batch_request.execute(http=http)
#21
1
About solution gave by J.F. Sebastian
here:
关于J.F. Sebastian提出的解决方案:
def chunker(iterable, chunksize):
return zip(*[iter(iterable)]*chunksize)
It's clever, but has one disadvantage - always return tuple. How to get string instead?
Of course you can write ''.join(chunker(...))
, but the temporary tuple is constructed anyway.
它很聪明,但有一个缺点——总是返回元组。如何获得字符串?当然,您可以编写“.join(chunker(…)),但不管怎样,临时的元组都是构建的。
You can get rid of the temporary tuple by writing own zip
, like this:
您可以通过编写自己的zip来删除临时元组,如下所示:
class IteratorExhausted(Exception):
pass
def translate_StopIteration(iterable, to=IteratorExhausted):
for i in iterable:
yield i
raise to # StopIteration would get ignored because this is generator,
# but custom exception can leave the generator.
def custom_zip(*iterables, reductor=tuple):
iterators = tuple(map(translate_StopIteration, iterables))
while True:
try:
yield reductor(next(i) for i in iterators)
except IteratorExhausted: # when any of iterators get exhausted.
break
Then
然后
def chunker(data, size, reductor=tuple):
return custom_zip(*[iter(data)]*size, reductor=reductor)
Example usage:
使用示例:
>>> for i in chunker('12345', 2):
... print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
... print(repr(i))
...
'12'
'34'
#22
1
Here is a chunker without imports that supports generators:
这是一个没有进口支持发电机的块机:
def chunks(seq, size):
it = iter(seq)
while True:
ret = tuple(it.next() for _ in range(size))
if len(ret) == size:
yield ret
else:
raise StopIteration()
Example of use:
使用的例子:
>>> def foo():
... i = 0
... while True:
... i += 1
... yield i
...
>>> c = chunks(foo(), 3)
>>> c.next()
(1, 2, 3)
>>> c.next()
(4, 5, 6)
>>> list(chunks('abcdefg', 2))
[('a', 'b'), ('c', 'd'), ('e', 'f')]
#23
1
def chunker(iterable, n):
"""Yield iterable in chunk sizes.
>>> chunks = chunker('ABCDEF', n=4)
>>> chunks.next()
['A', 'B', 'C', 'D']
>>> chunks.next()
['E', 'F']
"""
it = iter(iterable)
while True:
chunk = []
for i in range(n):
try:
chunk.append(it.next())
except StopIteration:
yield chunk
raise StopIteration
yield chunk
if __name__ == '__main__':
import doctest
doctest.testmod()
#24
1
I like this approach. It feels simple and not magical and supports all iterable types and doesn't require imports.
我喜欢这种方法。它感觉简单而不神奇,支持所有可迭代类型,不需要导入。
def chunk_iter(iterable, chunk_size):
it = iter(iterable)
while True:
chunk = tuple(next(it) for _ in range(chunk_size))
if not chunk:
break
yield chunk
#25
0
There doesn't seem to be a pretty way to do this. Here is a page that has a number of methods, including:
似乎没有什么好办法。这是一个有很多方法的页面,包括:
def split_seq(seq, size):
newseq = []
splitsize = 1.0/size*len(seq)
for i in range(size):
newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
return newseq
#26
0
If the lists are the same size, you can combine them into lists of 4-tuples with zip()
. For example:
如果列表是相同的大小,您可以将它们组合成带有zip()的4元组列表。例如:
# Four lists of four elements each.
l1 = range(0, 4)
l2 = range(4, 8)
l3 = range(8, 12)
l4 = range(12, 16)
for i1, i2, i3, i4 in zip(l1, l2, l3, l4):
...
Here's what the zip()
function produces:
下面是zip()函数的输出:
>>> print l1
[0, 1, 2, 3]
>>> print l2
[4, 5, 6, 7]
>>> print l3
[8, 9, 10, 11]
>>> print l4
[12, 13, 14, 15]
>>> print zip(l1, l2, l3, l4)
[(0, 4, 8, 12), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15)]
If the lists are large, and you don't want to combine them into a bigger list, use itertools.izip()
, which produces an iterator, rather than a list.
如果列表很大,并且您不想将它们组合成更大的列表,那么使用itertools.izip(),它生成迭代器,而不是列表。
from itertools import izip
for i1, i2, i3, i4 in izip(l1, l2, l3, l4):
...
#27
0
One-liner, adhoc solution to iterate over a list x
in chunks of size 4
-
用一行的、特殊的解决方案来迭代列表x,以4 -为单位
for a, b, c, d in zip(x[0::4], x[1::4], x[2::4], x[3::4]):
... do something with a, b, c and d ...
#28
0
At first, I designed it to split strings into substrings to parse string containing hex.
Today I turned it into complex, but still simple generator.
首先,我设计它将字符串分割成子字符串,以解析包含十六进制的字符串。今天我把它变成了复杂的,但仍然是简单的生成器。
def chunker(iterable, size, reductor, condition):
it = iter(iterable)
def chunk_generator():
return (next(it) for _ in range(size))
chunk = reductor(chunk_generator())
while condition(chunk):
yield chunk
chunk = reductor(chunk_generator())
Arguments:
Obvious ones
-
iterable
is any iterable / iterator / generator containg / generating / iterating over input data, - iterable是任何可迭代/迭代器/生成器,包含/生成/迭代输入数据,
-
size
is, of course, size of chunk you want get, - 大小当然是你想要的块大小,
More interesting
-
reductor
is a callable, which receives generator iterating over content of chunk.
I'd expect it to return sequence or string, but I don't demand that.reductor是可调用的,它接收生成器对块内容进行迭代。我希望它返回序列或字符串,但我不要求它。
You can pass as this argument for example
list
,tuple
,set
,frozenset
,
or anything fancier. I'd pass this function, returning string
(provided thatiterable
contains / generates / iterates over strings):您可以将此参数作为示例列表、tuple、set、frozenset或其他更高级的参数来传递。我将传递这个函数,返回字符串(假设iterable包含/生成/迭代字符串):
def concatenate(iterable): return ''.join(iterable)
Note that
reductor
can cause closing generator by raising exception.注意,减速机会引发异常,从而导致关闭发电机。
-
condition
is a callable which receives anything whatreductor
returned.
It decides to approve & yield it (by returning anything evaluating toTrue
),
or to decline it & finish generator's work (by returning anything other or raising exception).条件是可调用的,它接收还原剂返回的任何东西。它决定批准并生成它(通过返回任何值为True),或者拒绝它并完成生成器的工作(通过返回任何其他内容或引发异常)。
When number of elements in
iterable
is not divisible bysize
, whenit
gets exhausted,reductor
will receive generator generating less elements thansize
.
Let's call these elements lasts elements.当iterable中的元素数量不能被大小整除时,当它被耗尽时,reductor会接收到生成比大小更少的元素的生成器。我们称这些元素为last元素。
I invited two functions to pass as this argument:
我邀请了两个函数作为这个参数:
-
lambda x:x
- the lasts elements will be yielded.x:x -最后的元素将会产生。
-
lambda x: len(x)==<size>
- the lasts elements will be rejected.
replace<size>
using number equal tosize
=
- last元素将被拒绝。用等于size的数字替换
-
#29
0
It is easy to make itertools.groupby
work for you to get an iterable of iterables, without creating any temporary lists:
创建迭代工具很容易。groupby帮助您获得可迭代的可迭代性,而不创建任何临时列表:
groupby(iterable, (lambda x,y: (lambda z: x.next()/y))(count(),100))
Don't get put off by the nested lambdas, outer lambda runs just once to put count()
generator and the constant 100
into the scope of the inner lambda.
不要被嵌套的lambda推迟,外部lambda只运行一次,将count()生成器和常量100放入内部lambda表达式的范围内。
I use this to send chunks of rows to mysql.
我用它向mysql发送数据块。
for k,v in groupby(bigdata, (lambda x,y: (lambda z: x.next()/y))(count(),100))):
cursor.executemany(sql, v)
#30
0
Quite pythonic here (you may also inline the body of the split_groups
function)
这里非常符合python语言(您也可以内联split_groups函数的主体)
import itertools
def split_groups(iter_in, group_size):
return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))
for x, y, z, w in split_groups(range(16), 4):
foo += x * y + z * w
#1
237
Modified from the recipes section of Python's itertools docs:
修改了Python的itertools文档的recipe部分:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return izip_longest(*args, fillvalue=fillvalue)
Example
In pseudocode to keep the example terse.
在伪代码中保持示例简洁。
grouper('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'
Note: izip_longest
is new to Python 2.6. In Python 3 use zip_longest
.
注意:izip_longest是Python 2.6的新特性。在Python 3中使用zip_longest。
#2
310
def chunker(seq, size):
return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))
Simple. Easy. Fast. Works with any sequence:
简单。一件容易的事。快。适用于任何序列:
text = "I am a very, very helpful text"
for group in chunker(text, 7):
print repr(group),
# 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt'
print '|'.join(chunker(text, 10))
# I am a ver|y, very he|lpful text
animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']
for group in chunker(animals, 3):
print group
# ['cat', 'dog', 'rabbit']
# ['duck', 'bird', 'cow']
# ['gnu', 'fish']
#3
87
I'm a fan of
我的粉丝
chunkSize= 4
for i in xrange(0, len(ints), chunkSize):
chunk = ints[i:i+chunkSize]
# process chunk of size <= chunkSize
#4
20
import itertools
def chunks(iterable,size):
it = iter(iterable)
chunk = tuple(itertools.islice(it,size))
while chunk:
yield chunk
chunk = tuple(itertools.islice(it,size))
# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
foo += x1 + x2 + x3 + x4
for chunk in chunks(ints,4):
foo += sum(chunk)
Another way:
另一种方法:
import itertools
def chunks2(iterable,size,filler=None):
it = itertools.chain(iterable,itertools.repeat(filler,size-1))
chunk = tuple(itertools.islice(it,size))
while len(chunk) == size:
yield chunk
chunk = tuple(itertools.islice(it,size))
# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
foo += x1 + x2 + x3 + x4
#5
11
from itertools import izip_longest
def chunker(iterable, chunksize, filler):
return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)
#6
8
I needed a solution that would also work with sets and generators. I couldn't come up with anything very short and pretty, but it's quite readable at least.
我需要一个同样适用于集合和生成器的解决方案。我想不出什么又短又漂亮的东西,但至少它可读性很强。
def chunker(seq, size):
res = []
for el in seq:
res.append(el)
if len(res) == size:
yield res
res = []
if res:
yield res
List:
列表:
>>> list(chunker([i for i in range(10)], 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Set:
设置:
>>> list(chunker(set([i for i in range(10)]), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Generator:
发电机:
>>> list(chunker((i for i in range(10)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
#7
6
Posting this as an answer since I cannot comment...
因为我不能评论,所以把这贴出来作为回答。
Using map() instead of zip() fixes the padding issue in J.F. Sebastian's answer:
使用map()而不是zip()修复了J.F. Sebastian的填充问题:
>>> def chunker(iterable, chunksize):
... return map(None,*[iter(iterable)]*chunksize)
Example:
例子:
>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'), ('0', None, None)]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8'), ('9', '0', None, None)]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]
#8
5
Similar to other proposals, but not exactly identical, I like doing it this way, because it's simple and easy to read:
与其他提案类似,但并非完全相同,我喜欢这样做,因为它简单易懂:
it = iter([1, 2, 3, 4, 5, 6, 7, 8, 9])
for chunk in zip(it, it, it, it):
print chunk
>>> (1, 2, 3, 4)
>>> (5, 6, 7, 8)
This way you won't get the last partial chunk. If you want to get (9, None, None, None)
as last chunk, just use izip_longest
from itertools
.
这样你就不会得到最后的部分。如果您想将(9,None, None, None)作为最后一个块,只需使用itertools中的izip_long。
#9
4
Since nobody's mentioned it yet here's a zip()
solution:
由于还没有人提到它,这里有一个zip()解决方案:
>>> def chunker(iterable, chunksize):
... return zip(*[iter(iterable)]*chunksize)
It works only if your sequence's length is always divisible by the chunk size or you don't care about a trailing chunk if it isn't.
只有当序列的长度总是可以被块大小整除,或者你不关心拖尾的块时,它才能工作。
Example:
例子:
>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8')]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]
Or using itertools.izip to return an iterator instead of a list:
出现或使用itertools。izip返回迭代器而不是列表:
>>> from itertools import izip
>>> def chunker(iterable, chunksize):
... return izip(*[iter(iterable)]*chunksize)
Padding can be fixed using @ΤΖΩΤΖΙΟΥ's answer:
填充可以使用@ΤΖΩΤΖΙΟΥ固定的回答是:
>>> from itertools import chain, izip, repeat
>>> def chunker(iterable, chunksize, fillvalue=None):
... it = chain(iterable, repeat(fillvalue, chunksize-1))
... args = [it] * chunksize
... return izip(*args)
#10
4
The ideal solution for this problem works with iterators (not just sequences). It should also be fast.
这个问题的理想解决方案是使用迭代器(而不仅仅是序列)。它也应该很快。
This is the solution provided by the documentation for itertools:
这是itertools文档提供的解决方案:
def grouper(n, iterable, fillvalue=None):
#"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
Using ipython's %timeit
on my mac book air, I get 47.5 us per loop.
在我的mac book air上使用ipython的%timeit,我每循环得到47.5个us。
However, this really doesn't work for me since the results are padded to be even sized groups. A solution without the padding is slightly more complicated. The most naive solution might be:
然而,这对我来说并不适用,因为结果被添加到更大的组中。没有填充的解决方案稍微复杂一些。最天真的解决办法可能是:
def grouper(size, iterable):
i = iter(iterable)
while True:
out = []
try:
for _ in range(size):
out.append(i.next())
except StopIteration:
yield out
break
yield out
Simple, but pretty slow: 693 us per loop
简单,但相当慢:693我们每个循环
The best solution I could come up with uses islice
for the inner loop:
我能想到的最好的解决方法是对内部循环使用islice:
def grouper(size, iterable):
it = iter(iterable)
while True:
group = tuple(itertools.islice(it, None, size))
if not group:
break
yield group
With the same dataset, I get 305 us per loop.
对于相同的数据集,我每循环得到305 us。
Unable to get a pure solution any faster than that, I provide the following solution with an important caveat: If your input data has instances of filldata
in it, you could get wrong answer.
由于无法获得比这更快的纯解决方案,我提供了一个重要的警告:如果您的输入数据中有filldata的实例,您可能会得到错误的答案。
def grouper(n, iterable, fillvalue=None):
#"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
for i in itertools.izip_longest(fillvalue=fillvalue, *args):
if tuple(i)[-1] == fillvalue:
yield tuple(v for v in i if v != fillvalue)
else:
yield i
I really don't like this answer, but it is significantly faster. 124 us per loop
我真的不喜欢这个答案,但它要快得多。124年美国每循环
#11
3
Using little functions and things really doesn't appeal to me; I prefer to just use slices:
使用小的函数和东西真的不吸引我;我更喜欢用切片:
data = [...]
chunk_size = 10000 # or whatever
chunks = [data[i:i+chunk_size] for i in xrange(0,len(data),chunk_size)]
for chunk in chunks:
...
#12
2
If the list is large, the highest-performing way to do this will be to use a generator:
如果列表很大,最有效的方法是使用生成器:
def get_chunk(iterable, chunk_size):
result = []
for item in iterable:
result.append(item)
if len(result) == chunk_size:
yield tuple(result)
result = []
if len(result) > 0:
yield tuple(result)
for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3):
print x
(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10,)
#13
2
Another approach would be to use the two-argument form of iter
:
另一种方法是使用iter的两参数形式:
from itertools import islice
def group(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
This can be adapted easily to use padding (this is similar to Markus Jarderot’s answer):
这可以很容易地适应使用填充(这类似于Markus Jarderot的答案):
from itertools import islice, chain, repeat
def group_pad(it, size, pad=None):
it = chain(iter(it), repeat(pad))
return iter(lambda: tuple(islice(it, size)), (pad,) * size)
These can even be combined for optional padding:
这些甚至可以用于可选填充:
_no_pad = object()
def group(it, size, pad=_no_pad):
if pad == _no_pad:
it = iter(it)
sentinel = ()
else:
it = chain(iter(it), repeat(pad))
sentinel = (pad,) * size
return iter(lambda: tuple(islice(it, size)), sentinel)
#14
2
With NumPy it's simple:
与NumPy很简单:
ints = array([1, 2, 3, 4, 5, 6, 7, 8])
for int1, int2 in ints.reshape(-1, 2):
print(int1, int2)
output:
输出:
1 2
3 4
5 6
7 8
#15
2
If you don't mind using an external package you could use iteration_utilities.grouper
from iteration_utilties
1. It supports all iterables (not just sequences):
如果不介意使用外部包,可以使用iteration_utilities。石斑鱼从iteration_utilties 1。它支持所有的iterables(不只是序列):
from iteration_utilities import grouper
seq = list(range(20))
for group in grouper(seq, 4):
print(group)
which prints:
打印:
(0, 1, 2, 3)
(4, 5, 6, 7)
(8, 9, 10, 11)
(12, 13, 14, 15)
(16, 17, 18, 19)
In case the length isn't a multiple of the groupsize it also supports filling (the incomplete last group) or truncating (discarding the incomplete last group) the last one:
如果长度不是组大小的倍数,它还支持填充(不完整的最后一组)或截断(丢弃不完整的最后一组)最后一组:
from iteration_utilities import grouper
seq = list(range(17))
for group in grouper(seq, 4):
print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16,)
for group in grouper(seq, 4, fillvalue=None):
print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16, None, None, None)
for group in grouper(seq, 4, truncate=True):
print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
1 Disclaimer: I'm the author of that package.
免责声明:我是这个包裹的作者。
#16
1
In your second method, I would advance to the next group of 4 by doing this:
在你的第二种方法中,我将通过以下方法进入下一个4人组:
ints = ints[4:]
However, I haven't done any performance measurement so I don't know which one might be more efficient.
但是,我还没有做任何性能度量,所以我不知道哪一个可能更有效。
Having said that, I would usually choose the first method. It's not pretty, but that's often a consequence of interfacing with the outside world.
说到这里,我通常会选择第一种方法。这并不漂亮,但这通常是与外界接触的结果。
#17
1
Yet another answer, the advantages of which are:
另一个答案是:
1) Easily understandable
2) Works on any iterable, not just sequences (some of the above answers will choke on filehandles)
3) Does not load the chunk into memory all at once
4) Does not make a chunk-long list of references to the same iterator in memory
5) No padding of fill values at the end of the list
1)容易理解的2)适用于任何iterable,不仅序列(有些文件句柄上面的答案将被)3)不一次性加载到内存块4)不会使chunk-long列表引用相同的迭代器在内存中5)没有填充物填值列表的最后
That being said, I haven't timed it so it might be slower than some of the more clever methods, and some of the advantages may be irrelevant given the use case.
话虽如此,我还没有对它进行计时,所以它可能比一些更聪明的方法要慢,而且考虑到用例,有些优点可能是无关的。
def chunkiter(iterable, size):
def inneriter(first, iterator, size):
yield first
for _ in xrange(size - 1):
yield iterator.next()
it = iter(iterable)
while True:
yield inneriter(it.next(), it, size)
In [2]: i = chunkiter('abcdefgh', 3)
In [3]: for ii in i:
for c in ii:
print c,
print ''
...:
a b c
d e f
g h
Update:
A couple of drawbacks due to the fact the inner and outer loops are pulling values from the same iterator:
1) continue doesn't work as expected in the outer loop - it just continues on to the next item rather than skipping a chunk. However, this doesn't seem like a problem as there's nothing to test in the outer loop.
2) break doesn't work as expected in the inner loop - control will wind up in the inner loop again with the next item in the iterator. To skip whole chunks, either wrap the inner iterator (ii above) in a tuple, e.g. for c in tuple(ii)
, or set a flag and exhaust the iterator.
更新:由于内循环和外循环从同一个迭代器中提取值这一事实,导致了一些缺陷:1)continue不能像外循环中预期的那样工作——它只是继续到下一个条目,而不是跳过一个块。然而,这似乎不是一个问题,因为在外部循环中没有什么要测试的。2) break不像内部循环中预期的那样工作——控件将在迭代器中的下一项中再次在内部循环中结束。要跳过整个块,可以将内部迭代器(ii)封装在一个元组中,例如,对于tuple(ii)中的c,或者设置一个标志并耗尽迭代器。
#18
1
def group_by(iterable, size):
"""Group an iterable into lists that don't exceed the size given.
>>> group_by([1,2,3,4,5], 2)
[[1, 2], [3, 4], [5]]
"""
sublist = []
for index, item in enumerate(iterable):
if index > 0 and index % size == 0:
yield sublist
sublist = []
sublist.append(item)
if sublist:
yield sublist
#19
1
You can use partition or chunks function from funcy library:
您可以使用功能库中的分区或块函数:
from funcy import partition
for a, b, c, d in partition(4, ints):
foo += a * b * c * d
These functions also has iterator versions ipartition
and ichunks
, which will be more efficient in this case.
这些函数还有迭代器版本的ipartition和ichunk,在这种情况下会更有效。
You can also peek at their implementation.
您还可以查看它们的实现。
#20
1
To avoid all conversions to a list import itertools
and:
为了避免所有的转换到一个列表导入迭代工具和:
>>> for k, g in itertools.groupby(xrange(35), lambda x: x/10):
... list(g)
Produces:
生产:
...
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
2 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
3 [30, 31, 32, 33, 34]
>>>
I checked groupby
and it doesn't convert to list or use len
so I (think) this will delay resolution of each value until it is actually used. Sadly none of the available answers (at this time) seemed to offer this variation.
我检查了groupby,它没有转换为list或使用len,所以我(认为)这会延迟每个值的解析,直到它被实际使用。遗憾的是,(在这个时候)所有可用的答案似乎都没有提供这种变化。
Obviously if you need to handle each item in turn nest a for loop over g:
显然,如果你需要依次处理每一项,在g上嵌一个for循环:
for k,g in itertools.groupby(xrange(35), lambda x: x/10):
for i in g:
# do what you need to do with individual items
# now do what you need to do with the whole group
My specific interest in this was the need to consume a generator to submit changes in batches of up to 1000 to the gmail API:
我对此特别感兴趣的是需要使用一个生成器将最多1000个批次的更改提交给gmail API:
messages = a_generator_which_would_not_be_smart_as_a_list
for idx, batch in groupby(messages, lambda x: x/1000):
batch_request = BatchHttpRequest()
for message in batch:
batch_request.add(self.service.users().messages().modify(userId='me', id=message['id'], body=msg_labels))
http = httplib2.Http()
self.credentials.authorize(http)
batch_request.execute(http=http)
#21
1
About solution gave by J.F. Sebastian
here:
关于J.F. Sebastian提出的解决方案:
def chunker(iterable, chunksize):
return zip(*[iter(iterable)]*chunksize)
It's clever, but has one disadvantage - always return tuple. How to get string instead?
Of course you can write ''.join(chunker(...))
, but the temporary tuple is constructed anyway.
它很聪明,但有一个缺点——总是返回元组。如何获得字符串?当然,您可以编写“.join(chunker(…)),但不管怎样,临时的元组都是构建的。
You can get rid of the temporary tuple by writing own zip
, like this:
您可以通过编写自己的zip来删除临时元组,如下所示:
class IteratorExhausted(Exception):
pass
def translate_StopIteration(iterable, to=IteratorExhausted):
for i in iterable:
yield i
raise to # StopIteration would get ignored because this is generator,
# but custom exception can leave the generator.
def custom_zip(*iterables, reductor=tuple):
iterators = tuple(map(translate_StopIteration, iterables))
while True:
try:
yield reductor(next(i) for i in iterators)
except IteratorExhausted: # when any of iterators get exhausted.
break
Then
然后
def chunker(data, size, reductor=tuple):
return custom_zip(*[iter(data)]*size, reductor=reductor)
Example usage:
使用示例:
>>> for i in chunker('12345', 2):
... print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
... print(repr(i))
...
'12'
'34'
#22
1
Here is a chunker without imports that supports generators:
这是一个没有进口支持发电机的块机:
def chunks(seq, size):
it = iter(seq)
while True:
ret = tuple(it.next() for _ in range(size))
if len(ret) == size:
yield ret
else:
raise StopIteration()
Example of use:
使用的例子:
>>> def foo():
... i = 0
... while True:
... i += 1
... yield i
...
>>> c = chunks(foo(), 3)
>>> c.next()
(1, 2, 3)
>>> c.next()
(4, 5, 6)
>>> list(chunks('abcdefg', 2))
[('a', 'b'), ('c', 'd'), ('e', 'f')]
#23
1
def chunker(iterable, n):
"""Yield iterable in chunk sizes.
>>> chunks = chunker('ABCDEF', n=4)
>>> chunks.next()
['A', 'B', 'C', 'D']
>>> chunks.next()
['E', 'F']
"""
it = iter(iterable)
while True:
chunk = []
for i in range(n):
try:
chunk.append(it.next())
except StopIteration:
yield chunk
raise StopIteration
yield chunk
if __name__ == '__main__':
import doctest
doctest.testmod()
#24
1
I like this approach. It feels simple and not magical and supports all iterable types and doesn't require imports.
我喜欢这种方法。它感觉简单而不神奇,支持所有可迭代类型,不需要导入。
def chunk_iter(iterable, chunk_size):
it = iter(iterable)
while True:
chunk = tuple(next(it) for _ in range(chunk_size))
if not chunk:
break
yield chunk
#25
0
There doesn't seem to be a pretty way to do this. Here is a page that has a number of methods, including:
似乎没有什么好办法。这是一个有很多方法的页面,包括:
def split_seq(seq, size):
newseq = []
splitsize = 1.0/size*len(seq)
for i in range(size):
newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
return newseq
#26
0
If the lists are the same size, you can combine them into lists of 4-tuples with zip()
. For example:
如果列表是相同的大小,您可以将它们组合成带有zip()的4元组列表。例如:
# Four lists of four elements each.
l1 = range(0, 4)
l2 = range(4, 8)
l3 = range(8, 12)
l4 = range(12, 16)
for i1, i2, i3, i4 in zip(l1, l2, l3, l4):
...
Here's what the zip()
function produces:
下面是zip()函数的输出:
>>> print l1
[0, 1, 2, 3]
>>> print l2
[4, 5, 6, 7]
>>> print l3
[8, 9, 10, 11]
>>> print l4
[12, 13, 14, 15]
>>> print zip(l1, l2, l3, l4)
[(0, 4, 8, 12), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15)]
If the lists are large, and you don't want to combine them into a bigger list, use itertools.izip()
, which produces an iterator, rather than a list.
如果列表很大,并且您不想将它们组合成更大的列表,那么使用itertools.izip(),它生成迭代器,而不是列表。
from itertools import izip
for i1, i2, i3, i4 in izip(l1, l2, l3, l4):
...
#27
0
One-liner, adhoc solution to iterate over a list x
in chunks of size 4
-
用一行的、特殊的解决方案来迭代列表x,以4 -为单位
for a, b, c, d in zip(x[0::4], x[1::4], x[2::4], x[3::4]):
... do something with a, b, c and d ...
#28
0
At first, I designed it to split strings into substrings to parse string containing hex.
Today I turned it into complex, but still simple generator.
首先,我设计它将字符串分割成子字符串,以解析包含十六进制的字符串。今天我把它变成了复杂的,但仍然是简单的生成器。
def chunker(iterable, size, reductor, condition):
it = iter(iterable)
def chunk_generator():
return (next(it) for _ in range(size))
chunk = reductor(chunk_generator())
while condition(chunk):
yield chunk
chunk = reductor(chunk_generator())
Arguments:
Obvious ones
-
iterable
is any iterable / iterator / generator containg / generating / iterating over input data, - iterable是任何可迭代/迭代器/生成器,包含/生成/迭代输入数据,
-
size
is, of course, size of chunk you want get, - 大小当然是你想要的块大小,
More interesting
-
reductor
is a callable, which receives generator iterating over content of chunk.
I'd expect it to return sequence or string, but I don't demand that.reductor是可调用的,它接收生成器对块内容进行迭代。我希望它返回序列或字符串,但我不要求它。
You can pass as this argument for example
list
,tuple
,set
,frozenset
,
or anything fancier. I'd pass this function, returning string
(provided thatiterable
contains / generates / iterates over strings):您可以将此参数作为示例列表、tuple、set、frozenset或其他更高级的参数来传递。我将传递这个函数,返回字符串(假设iterable包含/生成/迭代字符串):
def concatenate(iterable): return ''.join(iterable)
Note that
reductor
can cause closing generator by raising exception.注意,减速机会引发异常,从而导致关闭发电机。
-
condition
is a callable which receives anything whatreductor
returned.
It decides to approve & yield it (by returning anything evaluating toTrue
),
or to decline it & finish generator's work (by returning anything other or raising exception).条件是可调用的,它接收还原剂返回的任何东西。它决定批准并生成它(通过返回任何值为True),或者拒绝它并完成生成器的工作(通过返回任何其他内容或引发异常)。
When number of elements in
iterable
is not divisible bysize
, whenit
gets exhausted,reductor
will receive generator generating less elements thansize
.
Let's call these elements lasts elements.当iterable中的元素数量不能被大小整除时,当它被耗尽时,reductor会接收到生成比大小更少的元素的生成器。我们称这些元素为last元素。
I invited two functions to pass as this argument:
我邀请了两个函数作为这个参数:
-
lambda x:x
- the lasts elements will be yielded.x:x -最后的元素将会产生。
-
lambda x: len(x)==<size>
- the lasts elements will be rejected.
replace<size>
using number equal tosize
=
- last元素将被拒绝。用等于size的数字替换
-
#29
0
It is easy to make itertools.groupby
work for you to get an iterable of iterables, without creating any temporary lists:
创建迭代工具很容易。groupby帮助您获得可迭代的可迭代性,而不创建任何临时列表:
groupby(iterable, (lambda x,y: (lambda z: x.next()/y))(count(),100))
Don't get put off by the nested lambdas, outer lambda runs just once to put count()
generator and the constant 100
into the scope of the inner lambda.
不要被嵌套的lambda推迟,外部lambda只运行一次,将count()生成器和常量100放入内部lambda表达式的范围内。
I use this to send chunks of rows to mysql.
我用它向mysql发送数据块。
for k,v in groupby(bigdata, (lambda x,y: (lambda z: x.next()/y))(count(),100))):
cursor.executemany(sql, v)
#30
0
Quite pythonic here (you may also inline the body of the split_groups
function)
这里非常符合python语言(您也可以内联split_groups函数的主体)
import itertools
def split_groups(iter_in, group_size):
return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))
for x, y, z, w in split_groups(range(16), 4):
foo += x * y + z * w