删除重复值并将相应的列值相加

时间:2021-09-05 13:02:48

I have a list from which I need to remove the duplicate values and sum the corresponding column values. The list is:

我有一个列表,我需要从中删除重复的值并将相应的列值相加。清单是:

lst = [['20150815171000', '1', '2'],
       ['20150815171000', '2', '3'],
       ['20150815172000', '3', '4'],
       ['20150815172000', '4', '5'],
       ['20150815172000', '5', '6'],
       ['20150815173000', '6', '7']]

Now I need to traverse through the list and get the output something like this:

现在我需要遍历列表并获得如下输出:

lst2 = [['20150815171000', '3', '5'], 
        ['20150815172000', '12', '15'], 
        ['20150815173000', '6', '7']]

How could this be done? I have tried writing the code as shown below but it's just comparing to consecutive values not, not all the matched ones.

怎么可以这样做?我尝试编写如下所示的代码,但它只是比较连续的值而不是所有匹配的值。

    lst2 = []
    ws = wr = power = 0
    for i in range(len(lst)):
        if lst[i][0] == lst[i+1][0]:
            time = lst[i][0]
            ws = (float(lst[i][1])+float(lst[i+1][1]))
            wr = (float(lst[i][2])+float(lst[i+1][2]))      
        else:
           time = lst[i][0]
           ws = lst[i][1]
           wr = lst[i][2]
        lst2.append([time, ws, wr, power])

Can anyone let me know how can I do this?

任何人都可以让我知道我该怎么做?

5 个解决方案

#1


5  

I would use itertools.groupby , grouping based on the first element on the inner list.

我会使用itertools.groupby,根据内部列表中的第一个元素进行分组。

So I would first sort the list based on first element and then group based on it (If the list would already be sorted on that element, then you do not need to sort again , you can directly group) .

所以我首先根据第一个元素对列表进行排序,然后根据它进行分组(如果列表已经在该元素上排序,那么你不需要再次排序,你可以直接分组)。

Example -

new_lst = []
for k,g in itertools.groupby(sorted(lst,key=lambda x:x[0]) , lambda x:x[0]):
    l = list(g)
    new_lst.append([k,str(sum([int(x[1]) for x in l])), str(sum([int(x[2]) for x in l]))])

Demo -

>>> import itertools
>>>
>>> lst = [['20150815171000', '1', '2'],
...        ['20150815171000', '2', '3'],
...        ['20150815172000', '3', '4'],
...        ['20150815172000', '4', '5'],
...        ['20150815172000', '5', '6'],
...        ['20150815173000', '6', '7']]
>>>
>>> new_lst = []
>>> for k,g in itertools.groupby(sorted(lst,key=lambda x:x[0]) , lambda x:x[0]):
...     l = list(g)
...     new_lst.append([k,str(sum([int(x[1]) for x in l])), str(sum([int(x[2]) for x in l]))])
...
>>> new_lst
[['20150815171000', '3', '5'], ['20150815172000', '12', '15'], ['20150815173000', '6', '7']]

#2


3  

You could use a dictionary to manage unique entries in your list. Then you check if a key already contained in the keys of the dict. If the key already is in the dict then add to the present one, otherwise add a new entry to the dict.

您可以使用字典来管理列表中的唯一条目。然后检查是否已经包含在dict键中的键。如果密钥已经在dict中,则添加到当前密钥,否则向dict添加新条目。

Try this:

#!/usr/bin/env python3

sums = dict()
for key, *values in lst:
    try:
        # add to an already present entry in the dict
        sums[key] = [int(x)+y for x, y in zip(values, sums[key])]
    except KeyError:
        # if the entry is not already present add it to the dict
        # and cast the values to int to make the adding easier
        sums[key] = map(int, values)

# build the output list from dictionary
# also cast back the values to strings
lst2 = sorted([[key]+list(map(str, values)) for key, values in sums.items()])

The sorted in the last line might be optional. Depending on whether you need the output list to be sorted by the dict keys or not.

在最后一行中排序可能是可选的。取决于您是否需要按dict键对输出列表进行排序。

Note that this should work for any length of values after the key.

请注意,这应该适用于密钥后的任何长度的值。

#3


2  

Alternatively, I would suggest using pandas, quite straight forward with groupby and sum, here is one way to do it:

或者,我建议使用pandas,非常直接使用groupby和sum,这是一种方法:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(
[['20150815171000', '1', '2'],
 ['20150815171000', '2', '3'],
 ['20150815172000', '3', '4'],
 ['20150815172000', '4', '5'],
 ['20150815172000', '5', '6'],
 ['20150815173000', '6', '7']],
columns=['group', 'field1', 'field2'])

In [3]: df
Out[3]:
            group field1 field2
0  20150815171000      1      2
1  20150815171000      2      3
2  20150815172000      3      4
3  20150815172000      4      5
4  20150815172000      5      6
5  20150815173000      6      7

# need to convert from '1', '2'... to integer type
In [4]: df['field1'] = df['field1'].astype('int')

In [5]: df['field2'] = df['field2'].astype('int')

# this groupby(to_group_field) and sum() can achieve what you want
In [6]: df.groupby('group').sum()
Out[6]:
                field1  field2
group
20150815171000       3       5
20150815172000      12      15
20150815173000       6       7

# convert to the list of lists format as you expected
In [7]: df.groupby('group').sum().reset_index().values.tolist()
Out[7]:
[['20150815171000', 3, 5],
 ['20150815172000', 12, 15],
 ['20150815173000', 6, 7]]

Hope this helps.

希望这可以帮助。

#4


2  

Clean with lambda and sorted() using dictionary. Without additional libraries.

用lambda清理并使用字典排序()。没有额外的库。

lst = [['20150815171000', '1', '2'],
       ['20150815171000', '2', '3'],
       ['20150815172000', '3', '4'],
       ['20150815172000', '4', '5'],
       ['20150815172000', '5', '6'],
       ['20150815173000', '6', '7']]

dct = dict()
for a, b, c in lst:
    if a not in dct: 
        dct[a] = [b, c] 
    else: 
        dct[a] = map(lambda x, y: str(int(x)+int(y)), dct[a], [b,c])
lst2 = sorted([[k,v[0],v[1]] for k,v in dct.items()])

print(lst2)

Out:

[['20150815171000', '3', '5'], 
['20150815172000', '12', '15'], 
['20150815173000', '6', '7']]

#5


1  

Like commented on your question, I would also suggest to use a dictionary for help. I'm not a good programmer and there a certainly better ways, but this works:

就像您对问题的评论一样,我也建议使用字典寻求帮助。我不是一个优秀的程序员,并且有一些更好的方法,但这有效:

dct = dict()
for x, y, z in lst:
    if x not in dct:
        dct[x] = [y, z]
    else:
        dct[x] = [str(int(dct[x][0]) + int(y)), str(int(dct[x][1]) + int(z))]
lst2 = []
for k, v in dct.items():
    lst2.append([k, v[0], v[1]])

You are basically just iterating over the list and, adding a new item to the dictionary if the wanted number (e.g. '2015081517100') doesn't exist yet, else updating the corresponding values. In the end you just create another list out of the results in the dictionary

您基本上只是在列表上进行迭代,如果所需的数字(例如'2015081517100')尚不存在,则向字典添加新项目,否则更新相应的值。最后,您只需在字典中的结果中创建另一个列表

#1


5  

I would use itertools.groupby , grouping based on the first element on the inner list.

我会使用itertools.groupby,根据内部列表中的第一个元素进行分组。

So I would first sort the list based on first element and then group based on it (If the list would already be sorted on that element, then you do not need to sort again , you can directly group) .

所以我首先根据第一个元素对列表进行排序,然后根据它进行分组(如果列表已经在该元素上排序,那么你不需要再次排序,你可以直接分组)。

Example -

new_lst = []
for k,g in itertools.groupby(sorted(lst,key=lambda x:x[0]) , lambda x:x[0]):
    l = list(g)
    new_lst.append([k,str(sum([int(x[1]) for x in l])), str(sum([int(x[2]) for x in l]))])

Demo -

>>> import itertools
>>>
>>> lst = [['20150815171000', '1', '2'],
...        ['20150815171000', '2', '3'],
...        ['20150815172000', '3', '4'],
...        ['20150815172000', '4', '5'],
...        ['20150815172000', '5', '6'],
...        ['20150815173000', '6', '7']]
>>>
>>> new_lst = []
>>> for k,g in itertools.groupby(sorted(lst,key=lambda x:x[0]) , lambda x:x[0]):
...     l = list(g)
...     new_lst.append([k,str(sum([int(x[1]) for x in l])), str(sum([int(x[2]) for x in l]))])
...
>>> new_lst
[['20150815171000', '3', '5'], ['20150815172000', '12', '15'], ['20150815173000', '6', '7']]

#2


3  

You could use a dictionary to manage unique entries in your list. Then you check if a key already contained in the keys of the dict. If the key already is in the dict then add to the present one, otherwise add a new entry to the dict.

您可以使用字典来管理列表中的唯一条目。然后检查是否已经包含在dict键中的键。如果密钥已经在dict中,则添加到当前密钥,否则向dict添加新条目。

Try this:

#!/usr/bin/env python3

sums = dict()
for key, *values in lst:
    try:
        # add to an already present entry in the dict
        sums[key] = [int(x)+y for x, y in zip(values, sums[key])]
    except KeyError:
        # if the entry is not already present add it to the dict
        # and cast the values to int to make the adding easier
        sums[key] = map(int, values)

# build the output list from dictionary
# also cast back the values to strings
lst2 = sorted([[key]+list(map(str, values)) for key, values in sums.items()])

The sorted in the last line might be optional. Depending on whether you need the output list to be sorted by the dict keys or not.

在最后一行中排序可能是可选的。取决于您是否需要按dict键对输出列表进行排序。

Note that this should work for any length of values after the key.

请注意,这应该适用于密钥后的任何长度的值。

#3


2  

Alternatively, I would suggest using pandas, quite straight forward with groupby and sum, here is one way to do it:

或者,我建议使用pandas,非常直接使用groupby和sum,这是一种方法:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(
[['20150815171000', '1', '2'],
 ['20150815171000', '2', '3'],
 ['20150815172000', '3', '4'],
 ['20150815172000', '4', '5'],
 ['20150815172000', '5', '6'],
 ['20150815173000', '6', '7']],
columns=['group', 'field1', 'field2'])

In [3]: df
Out[3]:
            group field1 field2
0  20150815171000      1      2
1  20150815171000      2      3
2  20150815172000      3      4
3  20150815172000      4      5
4  20150815172000      5      6
5  20150815173000      6      7

# need to convert from '1', '2'... to integer type
In [4]: df['field1'] = df['field1'].astype('int')

In [5]: df['field2'] = df['field2'].astype('int')

# this groupby(to_group_field) and sum() can achieve what you want
In [6]: df.groupby('group').sum()
Out[6]:
                field1  field2
group
20150815171000       3       5
20150815172000      12      15
20150815173000       6       7

# convert to the list of lists format as you expected
In [7]: df.groupby('group').sum().reset_index().values.tolist()
Out[7]:
[['20150815171000', 3, 5],
 ['20150815172000', 12, 15],
 ['20150815173000', 6, 7]]

Hope this helps.

希望这可以帮助。

#4


2  

Clean with lambda and sorted() using dictionary. Without additional libraries.

用lambda清理并使用字典排序()。没有额外的库。

lst = [['20150815171000', '1', '2'],
       ['20150815171000', '2', '3'],
       ['20150815172000', '3', '4'],
       ['20150815172000', '4', '5'],
       ['20150815172000', '5', '6'],
       ['20150815173000', '6', '7']]

dct = dict()
for a, b, c in lst:
    if a not in dct: 
        dct[a] = [b, c] 
    else: 
        dct[a] = map(lambda x, y: str(int(x)+int(y)), dct[a], [b,c])
lst2 = sorted([[k,v[0],v[1]] for k,v in dct.items()])

print(lst2)

Out:

[['20150815171000', '3', '5'], 
['20150815172000', '12', '15'], 
['20150815173000', '6', '7']]

#5


1  

Like commented on your question, I would also suggest to use a dictionary for help. I'm not a good programmer and there a certainly better ways, but this works:

就像您对问题的评论一样,我也建议使用字典寻求帮助。我不是一个优秀的程序员,并且有一些更好的方法,但这有效:

dct = dict()
for x, y, z in lst:
    if x not in dct:
        dct[x] = [y, z]
    else:
        dct[x] = [str(int(dct[x][0]) + int(y)), str(int(dct[x][1]) + int(z))]
lst2 = []
for k, v in dct.items():
    lst2.append([k, v[0], v[1]])

You are basically just iterating over the list and, adding a new item to the dictionary if the wanted number (e.g. '2015081517100') doesn't exist yet, else updating the corresponding values. In the end you just create another list out of the results in the dictionary

您基本上只是在列表上进行迭代,如果所需的数字(例如'2015081517100')尚不存在,则向字典添加新项目,否则更新相应的值。最后,您只需在字典中的结果中创建另一个列表