在列表中识别连续重复项的最Pythonic方法是什么?

时间:2021-10-03 22:54:17

I've got a list of integers and I want to be able to identify contiguous blocks of duplicates: that is, I want to produce an order-preserving list of duples where each duples contains (int_in_question, number of occurrences).

我有一个整数列表,我希望能够识别连续的重复块:也就是说,我想生成一个保留顺序的重复列表,其中每个重复包含(int_in_question,出现次数)。

For example, if I have a list like:

例如,如果我有一个列表,如:

[0, 0, 0, 3, 3, 2, 5, 2, 6, 6]

I want the result to be:

我希望结果如下:

[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]

I have a fairly simple way of doing this with a for-loop, a temp, and a counter:

我用一个for循环,一个temp和一个计数器有一个相当简单的方法:

result_list = []
current = source_list[0]
count = 0
for value in source_list:
    if value == current:
        count += 1
    else:
        result_list.append((current, count))
        current = value
        count = 1
result_list.append((current, count))

But I really like python's functional programming idioms, and I'd like to be able to do this with a simple generator expression. However I find it difficult to keep sub-counts when working with generators. I have a feeling a two-step process might get me there, but for now I'm stumped.

但我真的很喜欢python的函数式编程习惯用法,我希望能用一个简单的生成器表达式来做到这一点。但是我发现在使用发电机时很难保留子计数。我有一种感觉,两个步骤可能会让我在那里,但现在我很难过。

Is there a particularly elegant/pythonic way to do this, especially with generators?

是否有一种特别优雅/ pythonic的方式来做到这一点,特别是对于发电机?

1 个解决方案

#1


44  

>>> from itertools import groupby
>>> L = [0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
>>> grouped_L = [(k, sum(1 for i in g)) for k,g in groupby(L)]
>>> # Or (k, len(list(g))), but that creates an intermediate list
>>> grouped_L
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]

Batteries included, as they say.

正如他们所说,包括电池。

Suggestion for using sum and generator expression from JBernardo; see comment.

建议使用JBernardo的sum和generator表达式;看评论。

#1


44  

>>> from itertools import groupby
>>> L = [0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
>>> grouped_L = [(k, sum(1 for i in g)) for k,g in groupby(L)]
>>> # Or (k, len(list(g))), but that creates an intermediate list
>>> grouped_L
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]

Batteries included, as they say.

正如他们所说,包括电池。

Suggestion for using sum and generator expression from JBernardo; see comment.

建议使用JBernardo的sum和generator表达式;看评论。