Python:在同一个列表中进行双/多次迭代的优雅方式

时间:2022-09-10 20:19:06

I've written a bit of code like the following to compare items with other items further on in a list. Is there a more elegant pattern for this sort of dual iteration?

我已经编写了一些代码,如下所示,将项目与列表中的其他项目进行比较。这种双重迭代是否有更优雅的模式?

jump_item_iter = (j for j in items if some_cond)
try:
    jump_item = jump_item_iter.next()
except StopIteration:
    return
for item in items:
    if jump_item is item:
        try:
            jump_item = jump_iter.next()
        except StopIteration:
            return
    # do lots of stuff with item and jump_item

I don't think the "except StopIteration" is very elegant

我不认为“除了StopIteration”非常优雅

Edit:

To hopefully make it clearer, I want to visit each item in a list and pair it with the next item further on in the list (jump_item) which satisfies some_cond.

为了更清楚,我希望访问列表中的每个项目,并将其与列表中的下一个项目(jump_item)配对,满足some_cond。

17 个解决方案

#1


As far as I can see any of the existing solutions work on a general one shot, possiboly infinite iterator, all of them seem to require an iterable.

据我所知,任何现有的解决方案都适用于一般的一次性,可能的无限迭代器,所有这些解决方案似乎都需要迭代。

Heres a solution to that.

这是一个解决方案。

def batch_by(condition, seq):
    it = iter(seq)
    batch = [it.next()]
    for jump_item in it:
        if condition(jump_item):
            for item in batch:
                yield item, jump_item
            batch = []
        batch.append(jump_item)

This will easily work on infinite iterators:

这将很容易在无限迭代器上工作:

from itertools import count, islice
is_prime = lambda n: n == 2 or all(n % div for div in xrange(2,n))
print list(islice(batch_by(is_prime, count()), 100))

This will print first 100 integers with the prime number that follows them.

这将打印前100个整数及其后面的素数。

#2


I have no idea what compare() is doing, but 80% of the time, you can do this with a trivial dictionary or pair of dictionaries. Jumping around in a list is a kind of linear search. Linear Search -- to the extent possible -- should always be replaced with either a direct reference (i.e., a dict) or a tree search (using the bisect module).

我不知道compare()在做什么,但是80%的情况下,你可以使用一个简单的字典或一对字典来做到这一点。在列表中跳转是一种线性搜索。线性搜索 - 在可能的范围内 - 应始终用直接引用(即dict)或树搜索(使用bisect模块)替换。

#3


How about this?

这个怎么样?

paired_values = []
for elmt in reversed(items):
    if <condition>:
        current_val = elmt
    try:
        paired_values.append(current_val)
    except NameError:  # for the last elements of items that don't pass the condition
        pass
paired_values.reverse()

for (item, jump_item) in zip(items, paired_values):  # zip() truncates to len(paired_values)
    # do lots of stuff

If the first element of items matches, then it is used as a jump_item. This is the only difference with your original code (and you might want this behavior).

如果项的第一个元素匹配,则将其用作jump_item。这是与原始代码的唯一区别(您可能想要这种行为)。

#4


The following iterator is time and memory-efficient:

以下迭代器是时间和内存效率:

def jump_items(items):
    number_to_be_returned = 0
    for elmt in items:
        if <condition(elmt)>:
            for i in range(number_to_be_returned):
                yield elmt
            number_to_be_returned = 1
        else:
            number_to_be_returned += 1

for (item, jump_item) in zip(items, jump_items(items)):
    # do lots of stuff

Note that you may actually want to set the first number_to_be_returned to 1...

请注意,您可能实际上想要将第一个number_to_be_returned设置为1 ...

#5


Write a generator function:

写一个生成器函数:

def myIterator(someValue):
    yield (someValue[0], someValue[1])

for element1, element2 in myIterator(array):
     # do something with those elements.

#6


I have no idea what you're trying to do with that code. But I'm 99% certain that whatever it is could probably be done in 2 lines. I also get the feeling that the '==' operator should be an 'is' operator, otherwise what is the compare() function doing? And what happens if the item returned from the second jump_iter.next call also equals 'item'? It seems like the algorithm would do the wrong thing since you'll compare the second and not the first.

我不知道你要用这​​个代码做什么。但我99%肯定无论它是什么都可以用2行完成。我也觉得'=='运算符应该是'is'运算符,否则compare()函数在做什么?如果从第二个jump_iter.next调用返回的项目也等于'item'会发生什么?似乎算法会做错误的事情,因为你会比较第二个而不是第一个。

#7


So you want to compare pairs of items in the same list, the second item of the pair having to meet some condition. Normally, when you want to compare pairs in a list use zip (or itertools.izip):

因此,您希望比较同一列表中的项目对,该对中的第二项必须满足某些条件。通常,当您想要比较列表中的对使用zip(或itertools.izip)时:

for item1, item2 in zip(items, items[1:]):
    compare(item1, item2)

Figure out how to fit your some_cond in this :)

弄清楚如何适应你的some_cond :)

#8


Are you basically trying to compare every item in the iterator with every other item in the original list?

您是否基本上尝试将迭代器中的每个项目与原始列表中的其他项目进行比较?

To my mind this should just be a case of using two loops, rather than trying to fit it into one.

在我看来,这应该只是一个使用两个循环的情况,而不是试图将其合二为一。


filtered_items = (j for j in items if some_cond)
for filtered in filtered_items:
    for item in items:
        if filtered != item:
            compare(filtered, item)

#9


My first answer was wrong because I didn't quite understand what you were trying to achieve. So if I understand correctly (this time, I hope), you want the main for item in items: to "chase" after an iterator that filters out some items. Well, there's not much you can do, except maybe wrap this into a chase_iterator(iterable, some_cond) generator, which would make your main code a little more readable.

我的第一个答案是错误的,因为我不太明白你想要实现的目标。所以,如果我理解正确(这次,我希望),你想要项目中的主要项目:在过滤掉一些项目的迭代器之后“追逐”。好吧,除了将它包装成chase_iterator(iterable,some_cond)生成器之外,你可以做的并不多,这将使你的主代码更具可读性。

Maybe that a more readable approach would be an "accumulator approach" (if the order of the compare() don't matter), like:

也许更可读的方法是“累加器方法”(如果compare()的顺序无关紧要),例如:

others = []
for item in items:
    if some_cond(item):
        for other in others:
            compare(item, other)
        others = []
    else:
        others.append(item)

(man, I'm beginning to hate Stack Overflow... too addictive...)

(伙计,我开始讨厌Stack Overflow ......太上瘾......)

#10


for i in range( 0, len( items ) ):
    for j in range( i+1, len( items ) ):
        if some_cond:
            #do something
            #items[i] = item, items[j] = jump_item

#11


Even better using itertools.groupby:

使用itertools.groupby更好:

def h(lst, cond):
  remain = lst
  for last in (l for l in lst if cond(l)):
    group = itertools.groupby(remain, key=lambda x: x < last)
    for start in group.next()[1]:
      yield start, last
    remain = list(group.next()[1])

Usage: lst = range(10) cond = lambda x: x%2 print list(h(lst, cond))

用法:lst = range(10)cond = lambda x:x%2打印列表(h(lst,cond))

will print

[(0, 1), (1, 3), (2, 3), (3, 5), (4, 5), (5, 7), (6, 7), (7, 9), (8, 9)]

#12


With just iterators

只有迭代器

def(lst, some_cond):
      jump_item_iter = (j for j in lst if som_cond(j))
      pairs = itertools.izip(lst, lst[1:])
      for last in jump_item_iter:
        for start, start_next in itertools.takewhile(lambda pair: pair[0] < last, pairs):
          yield start, last
        pairs = itertools.chain([(start_next, 'dummy')], pairs)

with the input: range(10) and some_cond = lambda x : x % 2 gives [(0, 1), (1, 3), (2, 3), (3, 5), (4, 5), (5, 7), (6, 7), (7, 9), (8, 9)] (same that your example)

输入:range(10)和some_cond = lambda x:x%2给出[(0,1),(1,3),(2,3),(3,5),(4,5),( 5,7),(6,7),(7,9),(8,9)](与你的例子相同)

#13


Maybe it is too late, but what about:

也许为时已晚,但是怎么样:

l = [j for j in items if some_cond]
for item, jump_item in zip(l, l[1:]):
    # do lots of stuff with item and jump_item

If l = [j for j in range(10) if j%2 ==0] then the iteration is over: [(0, 2),(2, 4),(4, 6),(6, 8)].

如果l = [j,范围内的j为j(10),如果j%2 == 0]则迭代结束:[(0,2),(2,4),(4,6),(6,8) ]。

#14


You could write your loop body as:

您可以将循环体编写为:

import itertools, functools, operator

for item in items:
    jump_item_iter = itertools.dropwhile(functools.partial(operator.is_, item), 
                                         jump_item_iter)

    # do something with item and jump_item_iter

dropwhile will return an iterator that skips over all those which match the condition (here "is item").

dropwhile将返回一个迭代器,它会跳过所有符合条件的迭代器(此处为“is item”)。

#15


You could put the whole iteration into a single try structure, that way it would be clearer:

您可以将整个迭代放入单个try结构中,这样会更清晰:

jump_item_iter = (j for j in items if some_cond)
try:
    jump_item = jump_item_iter.next()
    for item in items:
        if jump_item is item:
            jump_item = jump_iter.next()

    # do lots of stuff with item and jump_item

 except StopIteration:
     pass

#16


Here is one simple solution that might look a little cleaner:

这是一个简单的解决方案,可能看起来更清洁:

for i, item in enumerate(items):
    for next_item in items[i+1:]:
        if some_cond(next_item):
            break
    # do some stuff with both items

The disadvantage is that you check the condition for next_item multiple times. But you can easily optimize this:

缺点是您多次检查next_item的条件。但是你可以很容易地优化这个:

cond_items = [item if some_cond(item) else None for item in items]
for i, item in enumerate(items):
    for next_item in cond_items[i+1:]:
        if next_item is not None:
            break
    # do some stuff with both items

However, both solutions carry more overhead than the original solution from the question. And when you start using counters to work around this then I think it is better to use the iterator interface directly (as in the original solution).

但是,这两种解决方案都比问题的原始解决方案带来更多的开销。当你开始使用计数器来解决这个问题时,我认为最好直接使用迭代器接口(就像在原始解决方案中一样)。

#17


You could do something like:

你可以这样做:

import itertools

def matcher(iterable, compare):
    iterator= iter(iterable)
    while True:
        try: item= iterator.next()
        except StopIteration: break
        iterator, iterator2= itertools.tee(iterator)
        for item2 in iterator2:
            if compare(item, item2):
                yield item, item2

but it's quite elaborate (and actually not very efficient), and it would be simpler if you just did a

但它非常复杂(实际上并不是非常有效),如果你只是做了一个,它会更简单

items= list(iterable)

and then just write two loops over items.

然后只在项目上写两个循环。

Obviously, this won't work with infinite iterables, but your specification can only work on finite iterables.

显然,这不适用于无限可迭代,但您的规范只适用于有限迭代。

#1


As far as I can see any of the existing solutions work on a general one shot, possiboly infinite iterator, all of them seem to require an iterable.

据我所知,任何现有的解决方案都适用于一般的一次性,可能的无限迭代器,所有这些解决方案似乎都需要迭代。

Heres a solution to that.

这是一个解决方案。

def batch_by(condition, seq):
    it = iter(seq)
    batch = [it.next()]
    for jump_item in it:
        if condition(jump_item):
            for item in batch:
                yield item, jump_item
            batch = []
        batch.append(jump_item)

This will easily work on infinite iterators:

这将很容易在无限迭代器上工作:

from itertools import count, islice
is_prime = lambda n: n == 2 or all(n % div for div in xrange(2,n))
print list(islice(batch_by(is_prime, count()), 100))

This will print first 100 integers with the prime number that follows them.

这将打印前100个整数及其后面的素数。

#2


I have no idea what compare() is doing, but 80% of the time, you can do this with a trivial dictionary or pair of dictionaries. Jumping around in a list is a kind of linear search. Linear Search -- to the extent possible -- should always be replaced with either a direct reference (i.e., a dict) or a tree search (using the bisect module).

我不知道compare()在做什么,但是80%的情况下,你可以使用一个简单的字典或一对字典来做到这一点。在列表中跳转是一种线性搜索。线性搜索 - 在可能的范围内 - 应始终用直接引用(即dict)或树搜索(使用bisect模块)替换。

#3


How about this?

这个怎么样?

paired_values = []
for elmt in reversed(items):
    if <condition>:
        current_val = elmt
    try:
        paired_values.append(current_val)
    except NameError:  # for the last elements of items that don't pass the condition
        pass
paired_values.reverse()

for (item, jump_item) in zip(items, paired_values):  # zip() truncates to len(paired_values)
    # do lots of stuff

If the first element of items matches, then it is used as a jump_item. This is the only difference with your original code (and you might want this behavior).

如果项的第一个元素匹配,则将其用作jump_item。这是与原始代码的唯一区别(您可能想要这种行为)。

#4


The following iterator is time and memory-efficient:

以下迭代器是时间和内存效率:

def jump_items(items):
    number_to_be_returned = 0
    for elmt in items:
        if <condition(elmt)>:
            for i in range(number_to_be_returned):
                yield elmt
            number_to_be_returned = 1
        else:
            number_to_be_returned += 1

for (item, jump_item) in zip(items, jump_items(items)):
    # do lots of stuff

Note that you may actually want to set the first number_to_be_returned to 1...

请注意,您可能实际上想要将第一个number_to_be_returned设置为1 ...

#5


Write a generator function:

写一个生成器函数:

def myIterator(someValue):
    yield (someValue[0], someValue[1])

for element1, element2 in myIterator(array):
     # do something with those elements.

#6


I have no idea what you're trying to do with that code. But I'm 99% certain that whatever it is could probably be done in 2 lines. I also get the feeling that the '==' operator should be an 'is' operator, otherwise what is the compare() function doing? And what happens if the item returned from the second jump_iter.next call also equals 'item'? It seems like the algorithm would do the wrong thing since you'll compare the second and not the first.

我不知道你要用这​​个代码做什么。但我99%肯定无论它是什么都可以用2行完成。我也觉得'=='运算符应该是'is'运算符,否则compare()函数在做什么?如果从第二个jump_iter.next调用返回的项目也等于'item'会发生什么?似乎算法会做错误的事情,因为你会比较第二个而不是第一个。

#7


So you want to compare pairs of items in the same list, the second item of the pair having to meet some condition. Normally, when you want to compare pairs in a list use zip (or itertools.izip):

因此,您希望比较同一列表中的项目对,该对中的第二项必须满足某些条件。通常,当您想要比较列表中的对使用zip(或itertools.izip)时:

for item1, item2 in zip(items, items[1:]):
    compare(item1, item2)

Figure out how to fit your some_cond in this :)

弄清楚如何适应你的some_cond :)

#8


Are you basically trying to compare every item in the iterator with every other item in the original list?

您是否基本上尝试将迭代器中的每个项目与原始列表中的其他项目进行比较?

To my mind this should just be a case of using two loops, rather than trying to fit it into one.

在我看来,这应该只是一个使用两个循环的情况,而不是试图将其合二为一。


filtered_items = (j for j in items if some_cond)
for filtered in filtered_items:
    for item in items:
        if filtered != item:
            compare(filtered, item)

#9


My first answer was wrong because I didn't quite understand what you were trying to achieve. So if I understand correctly (this time, I hope), you want the main for item in items: to "chase" after an iterator that filters out some items. Well, there's not much you can do, except maybe wrap this into a chase_iterator(iterable, some_cond) generator, which would make your main code a little more readable.

我的第一个答案是错误的,因为我不太明白你想要实现的目标。所以,如果我理解正确(这次,我希望),你想要项目中的主要项目:在过滤掉一些项目的迭代器之后“追逐”。好吧,除了将它包装成chase_iterator(iterable,some_cond)生成器之外,你可以做的并不多,这将使你的主代码更具可读性。

Maybe that a more readable approach would be an "accumulator approach" (if the order of the compare() don't matter), like:

也许更可读的方法是“累加器方法”(如果compare()的顺序无关紧要),例如:

others = []
for item in items:
    if some_cond(item):
        for other in others:
            compare(item, other)
        others = []
    else:
        others.append(item)

(man, I'm beginning to hate Stack Overflow... too addictive...)

(伙计,我开始讨厌Stack Overflow ......太上瘾......)

#10


for i in range( 0, len( items ) ):
    for j in range( i+1, len( items ) ):
        if some_cond:
            #do something
            #items[i] = item, items[j] = jump_item

#11


Even better using itertools.groupby:

使用itertools.groupby更好:

def h(lst, cond):
  remain = lst
  for last in (l for l in lst if cond(l)):
    group = itertools.groupby(remain, key=lambda x: x < last)
    for start in group.next()[1]:
      yield start, last
    remain = list(group.next()[1])

Usage: lst = range(10) cond = lambda x: x%2 print list(h(lst, cond))

用法:lst = range(10)cond = lambda x:x%2打印列表(h(lst,cond))

will print

[(0, 1), (1, 3), (2, 3), (3, 5), (4, 5), (5, 7), (6, 7), (7, 9), (8, 9)]

#12


With just iterators

只有迭代器

def(lst, some_cond):
      jump_item_iter = (j for j in lst if som_cond(j))
      pairs = itertools.izip(lst, lst[1:])
      for last in jump_item_iter:
        for start, start_next in itertools.takewhile(lambda pair: pair[0] < last, pairs):
          yield start, last
        pairs = itertools.chain([(start_next, 'dummy')], pairs)

with the input: range(10) and some_cond = lambda x : x % 2 gives [(0, 1), (1, 3), (2, 3), (3, 5), (4, 5), (5, 7), (6, 7), (7, 9), (8, 9)] (same that your example)

输入:range(10)和some_cond = lambda x:x%2给出[(0,1),(1,3),(2,3),(3,5),(4,5),( 5,7),(6,7),(7,9),(8,9)](与你的例子相同)

#13


Maybe it is too late, but what about:

也许为时已晚,但是怎么样:

l = [j for j in items if some_cond]
for item, jump_item in zip(l, l[1:]):
    # do lots of stuff with item and jump_item

If l = [j for j in range(10) if j%2 ==0] then the iteration is over: [(0, 2),(2, 4),(4, 6),(6, 8)].

如果l = [j,范围内的j为j(10),如果j%2 == 0]则迭代结束:[(0,2),(2,4),(4,6),(6,8) ]。

#14


You could write your loop body as:

您可以将循环体编写为:

import itertools, functools, operator

for item in items:
    jump_item_iter = itertools.dropwhile(functools.partial(operator.is_, item), 
                                         jump_item_iter)

    # do something with item and jump_item_iter

dropwhile will return an iterator that skips over all those which match the condition (here "is item").

dropwhile将返回一个迭代器,它会跳过所有符合条件的迭代器(此处为“is item”)。

#15


You could put the whole iteration into a single try structure, that way it would be clearer:

您可以将整个迭代放入单个try结构中,这样会更清晰:

jump_item_iter = (j for j in items if some_cond)
try:
    jump_item = jump_item_iter.next()
    for item in items:
        if jump_item is item:
            jump_item = jump_iter.next()

    # do lots of stuff with item and jump_item

 except StopIteration:
     pass

#16


Here is one simple solution that might look a little cleaner:

这是一个简单的解决方案,可能看起来更清洁:

for i, item in enumerate(items):
    for next_item in items[i+1:]:
        if some_cond(next_item):
            break
    # do some stuff with both items

The disadvantage is that you check the condition for next_item multiple times. But you can easily optimize this:

缺点是您多次检查next_item的条件。但是你可以很容易地优化这个:

cond_items = [item if some_cond(item) else None for item in items]
for i, item in enumerate(items):
    for next_item in cond_items[i+1:]:
        if next_item is not None:
            break
    # do some stuff with both items

However, both solutions carry more overhead than the original solution from the question. And when you start using counters to work around this then I think it is better to use the iterator interface directly (as in the original solution).

但是,这两种解决方案都比问题的原始解决方案带来更多的开销。当你开始使用计数器来解决这个问题时,我认为最好直接使用迭代器接口(就像在原始解决方案中一样)。

#17


You could do something like:

你可以这样做:

import itertools

def matcher(iterable, compare):
    iterator= iter(iterable)
    while True:
        try: item= iterator.next()
        except StopIteration: break
        iterator, iterator2= itertools.tee(iterator)
        for item2 in iterator2:
            if compare(item, item2):
                yield item, item2

but it's quite elaborate (and actually not very efficient), and it would be simpler if you just did a

但它非常复杂(实际上并不是非常有效),如果你只是做了一个,它会更简单

items= list(iterable)

and then just write two loops over items.

然后只在项目上写两个循环。

Obviously, this won't work with infinite iterables, but your specification can only work on finite iterables.

显然,这不适用于无限可迭代,但您的规范只适用于有限迭代。