分割一个生成器/在python中可迭代的每n项(splitEvery)

时间:2021-10-13 21:47:27

I'm trying to write the Haskel function 'splitEvery' in Python. Here is it's definition:

我正在尝试用Python编写Haskel函数“splitEvery”。这是它的定义:

splitEvery :: Int -> [e] -> [[e]]
    @'splitEvery' n@ splits a list into length-n pieces.  The last
    piece will be shorter if @n@ does not evenly divide the length of
    the list.

The basic version of this works fine, but I want a version that works with generator expressions, lists, and iterators. And, if there is a generator as an input it should return a generator as an output!

这个基本版本运行良好,但是我想要一个与生成器表达式、列表和迭代器一起工作的版本。而且,如果有一个生成器作为输入,它应该返回一个生成器作为输出!

Tests

# should not enter infinite loop with generators or lists
splitEvery(itertools.count(), 10)
splitEvery(range(1000), 10)

# last piece must be shorter if n does not evenly divide
assert splitEvery(5, range(9)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

# should give same correct results with generators
tmp = itertools.islice(itertools.count(), 10)
assert list(splitEvery(5, tmp)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

Current Implementation

Here is the code I currently have but it doesn't work with a simple list.

这是我目前拥有的代码,但它并不是一个简单的列表。

def splitEvery_1(n, iterable):
    res = list(itertools.islice(iterable, n))
    while len(res) != 0:
        yield res
        res = list(itertools.islice(iterable, n))

This one doesn't work with a generator expression (thanks to jellybean for fixing it):

这个不能使用生成器表达式(多亏jellybean修复了它):

def splitEvery_2(n, iterable): 
    return [iterable[i:i+n] for i in range(0, len(iterable), n)]

There has to be a simple piece of code that does the splitting. I know I could just have different functions but it seems like it should be and easy thing to do. I'm probably getting stuck on an unimportant problem but it's really bugging me.

必须有一段简单的代码来进行分割。我知道我可以有不同的函数,但是看起来应该很简单。我可能被一个不重要的问题困住了,但它确实让我心烦。


It is similar to grouper from http://docs.python.org/library/itertools.html#itertools.groupby but I don't want it to fill extra values.

它类似于http://docs.python.org/library/itertools.html#itertools.groupby中的grouper,但是我不希望它填充额外的值。

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

It does mention a method that truncates the last value. This isn't what I want either.

它确实提到了一个截断最后一个值的方法。这也不是我想要的。

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using izip(*[iter(s)]*n).

可以保证迭代的从左到右的计算顺序。这使得使用izip(*[iter(s)]*n)将数据序列聚类成n长度组成为可能。

list(izip(*[iter(range(9))]*5)) == [[0, 1, 2, 3, 4]]
# should be [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

13 个解决方案

#1


40  

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

Some tests:

一些测试:

>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]

>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']

>>> list(split_every(100, []))
[]

#2


16  

Here's a quick one-liner version. Like Haskell's, it is lazy.

这里有一个简单的版本。和Haskell一样,它也是懒惰的。

from itertools import islice, takewhile, repeat
split_every = (lambda n, it:
    takewhile(bool, (list(islice(it, n)) for _ in repeat(None))))

This requires that you use iter before calling split_every.

这要求在调用split_every之前使用iter。

Example:

例子:

list(split_every(5, iter(xrange(9))))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

Although not a one-liner, the version below doesn't require that you call iter which can be a common pitfall.

虽然不是一行代码,但是下面的版本并不要求您调用iter,它可能是一个常见的陷阱。

from itertools import islice, takewhile, repeat

def split_every(n, iterable):
    """
    Slice an iterable into chunks of n elements
    :type n: int
    :type iterable: Iterable
    :rtype: Iterator
    """
    iterator = iter(iterable)
    return takewhile(bool, (list(islice(iterator, n)) for _ in repeat(None)))

(Thanks to @eli-korvigo for improvements.)

(感谢@eli-korvigo的改进)

#3


5  

building off of the accepted answer and employing a lesser-known use of iter (that, when passed a second arg, it calls the first until it receives the second), you can do this really easily:

基于已接受的答案,并使用不太为人知的iter(当它通过第二个arg时,它会调用第一个arg,直到它接收到第二个arg),您可以很容易地做到:

python3:

python3:

from itertools import islice

def split_every(n, iterable):
    iterable = iter(iterable)
    yield from iter(lambda: list(islice(iterable, n)), [])

python2:

python2:

def split_every(n, iterable):
    iterable = iter(iterable)
    for chunk in iter(lambda: list(islice(iterable, n)), []):
        yield chunk

#4


3  

A one-liner, inlineable solution to this (supports v2/v3, iterators, uses standard library and a single generator comprehension):

对此有一个线性的、不可行的解决方案(支持v2/v3、迭代器,使用标准库和单个生成器理解):

import itertools
def split_groups(iter_in, group_size):
     return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))

#5


3  

more_itertools has a chunked function:

more_itertools有一个分块函数:

import more_itertools as mit


list(mit.chunked(range(9), 5))
# [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

#6


2  

I think those questions are almost equal

我认为这些问题几乎是一样的

Changing a little bit to crop the last, I think a good solution for the generator case would be:

我认为对于发电机的案例来说,一个好的解决方案是:

from itertools import *
def iter_grouper(n, iterable):
    it = iter(iterable)
    item = itertools.islice(it, n)
    while item:
        yield item
        item = itertools.islice(it, n)

for the object that supports slices (lists, strings, tuples), we can do:

对于支持切片(列表、字符串、元组)的对象,我们可以:

def slice_grouper(n, sequence):
   return [sequence[i:i+n] for i in range(0, len(sequence), n)]

now it's just a matter of dispatching the correct method:

现在的问题只是如何调度正确的方法:

def grouper(n, iter_or_seq):
    if hasattr(iter_or_seq, "__getslice__"):
        return slice_grouper(n, iter_or_seq)
    elif hasattr(iter_or_seq, "__iter__"):
        return iter_grouper(n, iter_or_seq)

I think you could polish it a little bit more :-)

我觉得你可以再润色一下:

#7


1  

Why not do it like this? Looks almost like your splitEvery_2 function.

为什么不这样做呢?看起来就像你的splitEvery_2函数。

def splitEveryN(n, it):
    return [it[i:i+n] for i in range(0, len(it), n)]

Actually it only takes away the unnecessary step interval from the slice in your solution. :)

实际上,它只会从解决方案中去掉不必要的步骤间隔。:)

#8


1  

This is an answer that works for both list and generator:

这个答案适用于列表和生成器:

from itertools import count, groupby
def split_every(size, iterable):
    c = count()
    for k, g in groupby(iterable, lambda x: next(c)//size):
        yield list(g) # or yield g if you want to output a generator

#9


1  

I came across this as I'm trying to chop up batches too, but doing it on a generator from a stream, so most of the solutions here aren't applicable, or don't work in python 3.

我也遇到过这种情况,因为我正在尝试分割成批,但是在一个流的生成器上做,所以这里的大多数解决方案都是不适用的,或者在python 3中不起作用。

For people still stumbling upon this, here's a general solution using itertools:

对于仍在无意中发现这一点的人,这里有一个使用迭代工具的通用解决方案:

from itertools import islice, chain

def iter_in_slices(iterator, size=None):
    while True:
        slice_iter = islice(iterator, size)
        # If no first object this is how StopIteration is triggered
        peek = next(slice_iter)
        # Put the first object back and return slice
        yield chain([peek], slice_iter)

#10


0  

Here is how you deal with list vs iterator:

下面是如何处理list和iterator:

def isList(L): # Implement it somehow - returns True or false
...
return (list, lambda x:x)[int(islist(L))](result)

#11


0  

def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'

#12


-1  

this will do the trick

这样就可以了

from itertools import izip_longest
izip_longest(it[::2], it[1::2])

where *it* is some iterable

它*在哪里是可迭代的?


Example:

例子:

izip_longest('abcdef'[::2], 'abcdef'[1::2]) -> ('a', 'b'), ('c', 'd'), ('e', 'f')

Let's break this down

让我们分解

'abcdef'[::2] -> 'ace'
'abcdef'[1::2] -> 'bdf'

As you can see the last number in the slice is specifying the interval that will be used to pick up items. You can read more about using extended slices here.

正如您所看到的,切片中的最后一个数字指定了用于拾取项的间隔。您可以在这里阅读有关使用扩展片的更多信息。

The zip function takes the first item from the first iterable and combines it with the first item with the second iterable. The zip function then does the same thing for the second and third items until one of the iterables runs out of values.

zip函数从第一个可迭代项中获取第一个项,并将其与第一个项与第二个可迭代项组合在一起。然后,zip函数对第二个和第三个项执行相同的操作,直到其中一个可迭代的值用完。

The result is an iterator. If you want a list use the list() function on the result.

结果是一个迭代器。如果想要列表,请在结果上使用list()函数。

#13


-1  

If you want a solution that

如果你想要一个解

  • uses generators only (no intermediate lists or tuples),
  • 只使用生成器(不使用中间列表或元组),
  • works for very long (or infinite) iterators,
  • 适用于非常长的(或无限的)迭代器,
  • works for very large batch sizes,
  • 适用于非常大的批号,

this does the trick:

这个诀窍:

def one_batch(first_value, iterator, batch_size):
    yield first_value
    for i in xrange(1, batch_size):
        yield iterator.next()

def batch_iterator(iterator, batch_size):
    iterator = iter(iterator)
    while True:
        first_value = iterator.next()  # Peek.
        yield one_batch(first_value, iterator, batch_size)

It works by peeking at the next value in the iterator and passing that as the first value to a generator (one_batch()) that will yield it, along with the rest of the batch.

它的工作方式是查看迭代器中的下一个值,并将其作为第一个值传递给一个生成器(one_batch()),该生成器将生成该值,以及其他批处理。

The peek step will raise StopIteration exactly when the input iterator is exhausted and there are no more batches. Since this is the correct time to raise StopIteration in the batch_iterator() method, there is no need to catch the exception.

peek步骤将在输入迭代器耗尽且没有更多批次时触发StopIteration。因为这是在batch_iterator()方法中引发StopIteration的正确时间,所以不需要捕捉异常。

This will process lines from stdin in batches:

这将分批处理来自stdin的线:

for input_batch in batch_iterator(sys.stdin, 10000):
    for line in input_batch:
        process(line)
    finalise()

I've found this useful for processing lots of data and uploading the results in batches to an external store.

我发现这对于处理大量数据和将结果批量上传到外部存储非常有用。

#1


40  

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

Some tests:

一些测试:

>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]

>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']

>>> list(split_every(100, []))
[]

#2


16  

Here's a quick one-liner version. Like Haskell's, it is lazy.

这里有一个简单的版本。和Haskell一样,它也是懒惰的。

from itertools import islice, takewhile, repeat
split_every = (lambda n, it:
    takewhile(bool, (list(islice(it, n)) for _ in repeat(None))))

This requires that you use iter before calling split_every.

这要求在调用split_every之前使用iter。

Example:

例子:

list(split_every(5, iter(xrange(9))))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

Although not a one-liner, the version below doesn't require that you call iter which can be a common pitfall.

虽然不是一行代码,但是下面的版本并不要求您调用iter,它可能是一个常见的陷阱。

from itertools import islice, takewhile, repeat

def split_every(n, iterable):
    """
    Slice an iterable into chunks of n elements
    :type n: int
    :type iterable: Iterable
    :rtype: Iterator
    """
    iterator = iter(iterable)
    return takewhile(bool, (list(islice(iterator, n)) for _ in repeat(None)))

(Thanks to @eli-korvigo for improvements.)

(感谢@eli-korvigo的改进)

#3


5  

building off of the accepted answer and employing a lesser-known use of iter (that, when passed a second arg, it calls the first until it receives the second), you can do this really easily:

基于已接受的答案,并使用不太为人知的iter(当它通过第二个arg时,它会调用第一个arg,直到它接收到第二个arg),您可以很容易地做到:

python3:

python3:

from itertools import islice

def split_every(n, iterable):
    iterable = iter(iterable)
    yield from iter(lambda: list(islice(iterable, n)), [])

python2:

python2:

def split_every(n, iterable):
    iterable = iter(iterable)
    for chunk in iter(lambda: list(islice(iterable, n)), []):
        yield chunk

#4


3  

A one-liner, inlineable solution to this (supports v2/v3, iterators, uses standard library and a single generator comprehension):

对此有一个线性的、不可行的解决方案(支持v2/v3、迭代器,使用标准库和单个生成器理解):

import itertools
def split_groups(iter_in, group_size):
     return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))

#5


3  

more_itertools has a chunked function:

more_itertools有一个分块函数:

import more_itertools as mit


list(mit.chunked(range(9), 5))
# [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

#6


2  

I think those questions are almost equal

我认为这些问题几乎是一样的

Changing a little bit to crop the last, I think a good solution for the generator case would be:

我认为对于发电机的案例来说,一个好的解决方案是:

from itertools import *
def iter_grouper(n, iterable):
    it = iter(iterable)
    item = itertools.islice(it, n)
    while item:
        yield item
        item = itertools.islice(it, n)

for the object that supports slices (lists, strings, tuples), we can do:

对于支持切片(列表、字符串、元组)的对象,我们可以:

def slice_grouper(n, sequence):
   return [sequence[i:i+n] for i in range(0, len(sequence), n)]

now it's just a matter of dispatching the correct method:

现在的问题只是如何调度正确的方法:

def grouper(n, iter_or_seq):
    if hasattr(iter_or_seq, "__getslice__"):
        return slice_grouper(n, iter_or_seq)
    elif hasattr(iter_or_seq, "__iter__"):
        return iter_grouper(n, iter_or_seq)

I think you could polish it a little bit more :-)

我觉得你可以再润色一下:

#7


1  

Why not do it like this? Looks almost like your splitEvery_2 function.

为什么不这样做呢?看起来就像你的splitEvery_2函数。

def splitEveryN(n, it):
    return [it[i:i+n] for i in range(0, len(it), n)]

Actually it only takes away the unnecessary step interval from the slice in your solution. :)

实际上,它只会从解决方案中去掉不必要的步骤间隔。:)

#8


1  

This is an answer that works for both list and generator:

这个答案适用于列表和生成器:

from itertools import count, groupby
def split_every(size, iterable):
    c = count()
    for k, g in groupby(iterable, lambda x: next(c)//size):
        yield list(g) # or yield g if you want to output a generator

#9


1  

I came across this as I'm trying to chop up batches too, but doing it on a generator from a stream, so most of the solutions here aren't applicable, or don't work in python 3.

我也遇到过这种情况,因为我正在尝试分割成批,但是在一个流的生成器上做,所以这里的大多数解决方案都是不适用的,或者在python 3中不起作用。

For people still stumbling upon this, here's a general solution using itertools:

对于仍在无意中发现这一点的人,这里有一个使用迭代工具的通用解决方案:

from itertools import islice, chain

def iter_in_slices(iterator, size=None):
    while True:
        slice_iter = islice(iterator, size)
        # If no first object this is how StopIteration is triggered
        peek = next(slice_iter)
        # Put the first object back and return slice
        yield chain([peek], slice_iter)

#10


0  

Here is how you deal with list vs iterator:

下面是如何处理list和iterator:

def isList(L): # Implement it somehow - returns True or false
...
return (list, lambda x:x)[int(islist(L))](result)

#11


0  

def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'

#12


-1  

this will do the trick

这样就可以了

from itertools import izip_longest
izip_longest(it[::2], it[1::2])

where *it* is some iterable

它*在哪里是可迭代的?


Example:

例子:

izip_longest('abcdef'[::2], 'abcdef'[1::2]) -> ('a', 'b'), ('c', 'd'), ('e', 'f')

Let's break this down

让我们分解

'abcdef'[::2] -> 'ace'
'abcdef'[1::2] -> 'bdf'

As you can see the last number in the slice is specifying the interval that will be used to pick up items. You can read more about using extended slices here.

正如您所看到的,切片中的最后一个数字指定了用于拾取项的间隔。您可以在这里阅读有关使用扩展片的更多信息。

The zip function takes the first item from the first iterable and combines it with the first item with the second iterable. The zip function then does the same thing for the second and third items until one of the iterables runs out of values.

zip函数从第一个可迭代项中获取第一个项,并将其与第一个项与第二个可迭代项组合在一起。然后,zip函数对第二个和第三个项执行相同的操作,直到其中一个可迭代的值用完。

The result is an iterator. If you want a list use the list() function on the result.

结果是一个迭代器。如果想要列表,请在结果上使用list()函数。

#13


-1  

If you want a solution that

如果你想要一个解

  • uses generators only (no intermediate lists or tuples),
  • 只使用生成器(不使用中间列表或元组),
  • works for very long (or infinite) iterators,
  • 适用于非常长的(或无限的)迭代器,
  • works for very large batch sizes,
  • 适用于非常大的批号,

this does the trick:

这个诀窍:

def one_batch(first_value, iterator, batch_size):
    yield first_value
    for i in xrange(1, batch_size):
        yield iterator.next()

def batch_iterator(iterator, batch_size):
    iterator = iter(iterator)
    while True:
        first_value = iterator.next()  # Peek.
        yield one_batch(first_value, iterator, batch_size)

It works by peeking at the next value in the iterator and passing that as the first value to a generator (one_batch()) that will yield it, along with the rest of the batch.

它的工作方式是查看迭代器中的下一个值,并将其作为第一个值传递给一个生成器(one_batch()),该生成器将生成该值,以及其他批处理。

The peek step will raise StopIteration exactly when the input iterator is exhausted and there are no more batches. Since this is the correct time to raise StopIteration in the batch_iterator() method, there is no need to catch the exception.

peek步骤将在输入迭代器耗尽且没有更多批次时触发StopIteration。因为这是在batch_iterator()方法中引发StopIteration的正确时间,所以不需要捕捉异常。

This will process lines from stdin in batches:

这将分批处理来自stdin的线:

for input_batch in batch_iterator(sys.stdin, 10000):
    for line in input_batch:
        process(line)
    finalise()

I've found this useful for processing lots of data and uploading the results in batches to an external store.

我发现这对于处理大量数据和将结果批量上传到外部存储非常有用。