单行删除重复项,保持列表顺序[重复]

时间:2021-02-28 04:40:48

This question already has an answer here:

这个问题在这里已有答案:

I have the following list:

我有以下列表:

['Herb', 'Alec', 'Herb', 'Don']

I want to remove duplicates while keeping the order, so it would be :

我想在保持订单的同时删除重复项,因此它将是:

['Herb', 'Alec', 'Don']

Here is how I would do this verbosely:

以下是我将如何详细地执行此操作:

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

Is there a way to do this in a single line?

有没有办法在一行中做到这一点?

6 个解决方案

#1


2  

You could use an OrderedDict, but I suggest sticking with your for-loop.

你可以使用OrderedDict,但我建议坚持你的for循环。

>>> from collections import OrderedDict
>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> list(OrderedDict.fromkeys(data))
['Herb', 'Alec', 'Don']

Just to reiterate: I seriously suggest sticking with your for-loop approach, and use a set to keep track of already seen items:

重申一下:我认真建议坚持你的for-loop方法,并使用一套来跟踪已经看过的项目:

>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> seen = set()
>>> unique_data = []
>>> for x in data:
...     if x not in seen:
...         unique_data.append(x)
...         seen.add(x)
...
>>> unique_data
['Herb', 'Alec', 'Don']

And in case you just want to be wacky (seriously don't do this):

如果你只是想要古怪(严重不要这样做):

>>> [t[0] for t in sorted(dict(zip(reversed(data), range(len(data), -1, -1))).items(), key=lambda t:t[1])]
['Herb', 'Alec', 'Don']

#2


5  

You could use a set to remove duplicates and then restore ordering. And it's just as slow as your original, yaeh :-)

您可以使用一组来删除重复项,然后恢复排序。它和你原来的一样慢,是的:-)

>>> sorted(set(l_old), key=l_old.index)
['Herb', 'Alec', 'Don']

#3


4  

Using pandas, create a series from the list, drop duplicates, and then convert it back to a list.

使用pandas,从列表中创建一个系列,删除重复项,然后将其转换回列表。

import pandas as pd

>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']

Timings

计时

Solution from @StefanPochmann is the clear winner for lists with high duplication.

来自@StefanPochmann的解决方案是高重复列表的明显赢家。

my_list = ['Herb', 'Alec', 'Don'] * 10000

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 µs per loop

For larger lists with no duplication (e.g. simply a range of numbers), the pandas solution is very fast.

对于没有重复的较大列表(例如,只是一系列数字),大熊猫解决方案非常快。

my_list = range(10000)

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop

#4


2  

If you really don't care about optimizations and stuff you can use the following:

如果您真的不关心优化和内容,可以使用以下内容:

s = ['Herb', 'Alec', 'Herb', 'Don']
[x[0] for x in zip(s, range(len(s))) if x[0] not in s[:x[1]]]

Note that in my opinion you really should use the for loop in your question or the answer by @juanpa.arrivillaga

请注意,在我看来,你真的应该在你的问题或@ juanpa.arrivillaga的答案中使用for循环

#5


0  

You can try this:

你可以试试这个:

l = ['Herb', 'Alec', 'Herb', 'Don']
data = [i[-1] for i in sorted([({a:i for i, a in enumerate(l)}[a], a) for a in set({a:i for i, a in enumerate(l)}.keys())], key = lambda x: x[0])]

Output:

输出:

['Alec', 'Herb', 'Don']

This algorithm merely removes the first instance of a duplicate value.

该算法仅删除重复值的第一个实例。

#6


0  

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

In one line..ish:

在一条线上......

l_new = []

[ l_new.append(item)  for item in l_old if item not in l_new]

Which has the behavior:

有哪些行为:

> a = [1,1,2,2,3,3,4,5,5]
> b = []
> [ b.append(item) for item in a if item not in b]
> print(b)
[1,2,3,4,5]

#1


2  

You could use an OrderedDict, but I suggest sticking with your for-loop.

你可以使用OrderedDict,但我建议坚持你的for循环。

>>> from collections import OrderedDict
>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> list(OrderedDict.fromkeys(data))
['Herb', 'Alec', 'Don']

Just to reiterate: I seriously suggest sticking with your for-loop approach, and use a set to keep track of already seen items:

重申一下:我认真建议坚持你的for-loop方法,并使用一套来跟踪已经看过的项目:

>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> seen = set()
>>> unique_data = []
>>> for x in data:
...     if x not in seen:
...         unique_data.append(x)
...         seen.add(x)
...
>>> unique_data
['Herb', 'Alec', 'Don']

And in case you just want to be wacky (seriously don't do this):

如果你只是想要古怪(严重不要这样做):

>>> [t[0] for t in sorted(dict(zip(reversed(data), range(len(data), -1, -1))).items(), key=lambda t:t[1])]
['Herb', 'Alec', 'Don']

#2


5  

You could use a set to remove duplicates and then restore ordering. And it's just as slow as your original, yaeh :-)

您可以使用一组来删除重复项,然后恢复排序。它和你原来的一样慢,是的:-)

>>> sorted(set(l_old), key=l_old.index)
['Herb', 'Alec', 'Don']

#3


4  

Using pandas, create a series from the list, drop duplicates, and then convert it back to a list.

使用pandas,从列表中创建一个系列,删除重复项,然后将其转换回列表。

import pandas as pd

>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']

Timings

计时

Solution from @StefanPochmann is the clear winner for lists with high duplication.

来自@StefanPochmann的解决方案是高重复列表的明显赢家。

my_list = ['Herb', 'Alec', 'Don'] * 10000

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 µs per loop

For larger lists with no duplication (e.g. simply a range of numbers), the pandas solution is very fast.

对于没有重复的较大列表(例如,只是一系列数字),大熊猫解决方案非常快。

my_list = range(10000)

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop

#4


2  

If you really don't care about optimizations and stuff you can use the following:

如果您真的不关心优化和内容,可以使用以下内容:

s = ['Herb', 'Alec', 'Herb', 'Don']
[x[0] for x in zip(s, range(len(s))) if x[0] not in s[:x[1]]]

Note that in my opinion you really should use the for loop in your question or the answer by @juanpa.arrivillaga

请注意,在我看来,你真的应该在你的问题或@ juanpa.arrivillaga的答案中使用for循环

#5


0  

You can try this:

你可以试试这个:

l = ['Herb', 'Alec', 'Herb', 'Don']
data = [i[-1] for i in sorted([({a:i for i, a in enumerate(l)}[a], a) for a in set({a:i for i, a in enumerate(l)}.keys())], key = lambda x: x[0])]

Output:

输出:

['Alec', 'Herb', 'Don']

This algorithm merely removes the first instance of a duplicate value.

该算法仅删除重复值的第一个实例。

#6


0  

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

In one line..ish:

在一条线上......

l_new = []

[ l_new.append(item)  for item in l_old if item not in l_new]

Which has the behavior:

有哪些行为:

> a = [1,1,2,2,3,3,4,5,5]
> b = []
> [ b.append(item) for item in a if item not in b]
> print(b)
[1,2,3,4,5]