Python:新列表对现有列表中的重复元素进行分组

时间:2022-09-20 07:39:26

I have one list of the form:

我有一个表单列表:

>>> my_list = ['BLA1', 'BLA2', 'BLA3', 'ELE1', 'ELE2', 'ELE3', 'PRI1', 'PRI2', 'NEA1', 'NEA2', 'MAU1', 'MAU2', 'MAU3']

and I want to create a new list, grouping the repeated elements into lists inside my new list, so at the end I will have:

我想创建一个新列表,将重复的元素分组到我的新列表中的列表中,所以最后我会:

>>> new_list = [['BLA1', 'BLA2', 'BLA3'], ['ELE1', 'ELE2', 'ELE3'], ['PRI1', 'PRI2'], ['NEA1', 'NEA2'], ['MAU1', 'MAU2', 'MAU3']]

3 个解决方案

#1


6  

Use itertools.groupby:

使用itertools.groupby:

import itertools

[list(group) for key, group in itertools.groupby(my_list, key=lambda v: v[:3])]

The key argument is needed here to extract just the part of the value you wanted to group on; the first 3 characters.

这里需要关键参数来提取您想要分组的值的一部分;前3个字符。

Result:

结果:

>>> my_list = ['BLA1', 'BLA2', 'BLA3', 'ELE1', 'ELE2', 'ELE3', 'PRI1', 'PRI2', 'NEA1', 'NEA2', 'MAU1', 'MAU2', 'MAU3']
>>> [list(group) for key, group in itertools.groupby(my_list, key=lambda v: v[:3])]
[['BLA1', 'BLA2', 'BLA3'], ['ELE1', 'ELE2', 'ELE3'], ['PRI1', 'PRI2'], ['NEA1', 'NEA2'], ['MAU1', 'MAU2', 'MAU3']]

groupby will combine successive keys that are equal into 1 group. If you have disjoint groups (so same value, but with other values in between) it'll create separate groups for those:

groupby将组合等于1组的连续键组合。如果你有不相交的组(如此相同的值,但中间有其他值),它将为这些组创建单独的组:

>>> my_list = ['a1', 'a2', 'b1', 'b2', 'a3', 'a4']
>>> [list(group) for key, group in itertools.groupby(my_list)]
[['a1', 'a2'], ['b1', 'b2'], ['a3', 'a4']]

If that is not what you want you will have to sort my_list first.

如果那不是你想要的,你必须先对my_list进行排序。

#2


1  

Make sure it's sorted and use

确保它已分类并使用

itertools.groupy

#3


1  

As an alternative to groupby, you could use collections.Counter:

作为groupby的替代方法,您可以使用collections.Counter:

In [40]: from collections import Counter

In [41]: [ [k]*v for (k,v) in Counter(my_list).iteritems() ]
Out[41]: 
[['PRI', 'PRI'],
 ['NEA', 'NEA'],
 ['BLA', 'BLA', 'BLA'],
 ['MAU', 'MAU', 'MAU'],
 ['ELE', 'ELE', 'ELE']]

This will work without the need to sort the list if the elements are all jumbled up, unlike groupby.

如果元素都混乱了,这将无需对列表进行排序,这与groupby不同。

#1


6  

Use itertools.groupby:

使用itertools.groupby:

import itertools

[list(group) for key, group in itertools.groupby(my_list, key=lambda v: v[:3])]

The key argument is needed here to extract just the part of the value you wanted to group on; the first 3 characters.

这里需要关键参数来提取您想要分组的值的一部分;前3个字符。

Result:

结果:

>>> my_list = ['BLA1', 'BLA2', 'BLA3', 'ELE1', 'ELE2', 'ELE3', 'PRI1', 'PRI2', 'NEA1', 'NEA2', 'MAU1', 'MAU2', 'MAU3']
>>> [list(group) for key, group in itertools.groupby(my_list, key=lambda v: v[:3])]
[['BLA1', 'BLA2', 'BLA3'], ['ELE1', 'ELE2', 'ELE3'], ['PRI1', 'PRI2'], ['NEA1', 'NEA2'], ['MAU1', 'MAU2', 'MAU3']]

groupby will combine successive keys that are equal into 1 group. If you have disjoint groups (so same value, but with other values in between) it'll create separate groups for those:

groupby将组合等于1组的连续键组合。如果你有不相交的组(如此相同的值,但中间有其他值),它将为这些组创建单独的组:

>>> my_list = ['a1', 'a2', 'b1', 'b2', 'a3', 'a4']
>>> [list(group) for key, group in itertools.groupby(my_list)]
[['a1', 'a2'], ['b1', 'b2'], ['a3', 'a4']]

If that is not what you want you will have to sort my_list first.

如果那不是你想要的,你必须先对my_list进行排序。

#2


1  

Make sure it's sorted and use

确保它已分类并使用

itertools.groupy

#3


1  

As an alternative to groupby, you could use collections.Counter:

作为groupby的替代方法,您可以使用collections.Counter:

In [40]: from collections import Counter

In [41]: [ [k]*v for (k,v) in Counter(my_list).iteritems() ]
Out[41]: 
[['PRI', 'PRI'],
 ['NEA', 'NEA'],
 ['BLA', 'BLA', 'BLA'],
 ['MAU', 'MAU', 'MAU'],
 ['ELE', 'ELE', 'ELE']]

This will work without the need to sort the list if the elements are all jumbled up, unlike groupby.

如果元素都混乱了,这将无需对列表进行排序,这与groupby不同。