从另一个列表中删除一个列表中出现的所有元素

时间:2021-01-17 08:00:15

Let's say I have two lists, l1 and l2. I want to perform l1 - l2, which returns all elements of l1 not in l2.

假设有两个列表,l1和l2。我要执行l1 - l2,它返回l1的所有元素而不是l2中的元素。

I can think of a naive loop approach to doing this, but that is going to be really inefficient. What is a pythonic and efficient way of doing this?

我可以想出一个简单的循环方法来做这个,但是这将是非常低效的。什么是勾股定理和高效的方法?

As an example, if I have l1 = [1,2,6,8] and l2 = [2,3,5,8], l1 - l2 should return [1,6]

例如,如果l1 =[1,2,6,8]而l2 = [2,3,5,8], l1 - l2应该返回[1,6]

6 个解决方案

#1


262  

Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3:

Python有一种称为列表理解的语言特性,它非常适合使这种事情变得极其简单。下面的语句完全符合您的要求,并将结果存储在l3中:

l3 = [x for x in l1 if x not in l2]

l3 will contain [1, 6].

l3将包含[1,6]。

Hope this helps!

希望这可以帮助!

#2


76  

One way is to use sets:

一种方法是使用集合:

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])

#3


30  

Expanding on Donut's answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a set data structure (since the in operator is O(n) on a list but O(1) on a set).

扩展Donut的答案和这里的其他答案,通过使用生成器理解而不是列表理解,以及使用set数据结构(因为in操作符在列表上是O(n),而在集合上是O(1)),您可以得到更好的结果。

So here's a function that would work for you:

这是一个对你有用的函数:

def filter_list(full_list, excludes):
    s = set(excludes)
    return (x for x in full_list if x not in s)

The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a len() on the result), then you can easily build a list like so:

结果将是一个迭代,它将延迟地获取过滤后的列表。如果您需要一个真正的列表对象(例如,如果您需要对结果进行len()),那么您可以轻松构建这样的列表:

filtered_list = list(filter_list(full_list, excludes))

#4


23  

Use the Python set type. That would be the most Pythonic. :)

使用Python集类型。这是最复杂的。:)

Also, since it's native, it should be the most optimized method too.

而且,由于它是本地的,所以它也应该是最优化的方法。

See:

看到的:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/stdtypes.html集

http://docs.python.org/library/sets.htm (for older python)

http://docs.python.org/library/sets.htm python(旧)

# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2

#5


10  

As an alternative, you may also use filter with the lambda expression to get the desired result. For example:

作为一种替代方法,您还可以使用lambda表达式的过滤器来获得所需的结果。例如:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  `filter` returns the a iterator object. Here I'm type-casting 
#     v  it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]

Performance Comparison

性能比较

Here I am comparing the performance of all the answers mentioned here. As expected, Arkku's set based operation is fastest.

这里我比较了这里提到的所有答案的性能。正如预期的那样,Arkku基于集合的操作是最快的。

  • Arkku's Set Difference - First (0.124 usec per loop)

    Arkku的Set Difference - First(每循环0.124 usec)

    mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
    10000000 loops, best of 3: 0.124 usec per loop
    
  • Daniel Pryden's List Comprehension with set lookup - Second (0.302 usec per loop)

    Daniel Pryden对集合查找的列表理解——第二(每个循环0.302 usec)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.302 usec per loop
    
  • Donut's List Comprehension on plain list - Third (0.552 usec per loop)

    Donut在普通列表上的列表理解——第三(每循环0.552 usec)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.552 usec per loop
    
  • Moinuddin Quadri's using filter - Fourth (0.972 usec per loop)

    Moinuddin Quadri使用filter - four(每循环0.972 usec)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
    1000000 loops, best of 3: 0.972 usec per loop
    
  • Akshay Hazari's using combination of reduce + filter - Fifth (3.97 usec per loop)

    Akshay Hazari使用reduce + filter - Fifth (3.97 usec / loop)组合

    mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
    100000 loops, best of 3: 3.97 usec per loop
    

PS: set do not maintain the order and remove the duplicate elements from the list. Hence, do not use set difference if you need any of these.

设置不维护订单并从列表中删除重复的元素。因此,如果您需要这些,不要使用set difference。

#6


7  

Alternate Solution :

替代解决方案:

reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])

#1


262  

Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3:

Python有一种称为列表理解的语言特性,它非常适合使这种事情变得极其简单。下面的语句完全符合您的要求,并将结果存储在l3中:

l3 = [x for x in l1 if x not in l2]

l3 will contain [1, 6].

l3将包含[1,6]。

Hope this helps!

希望这可以帮助!

#2


76  

One way is to use sets:

一种方法是使用集合:

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])

#3


30  

Expanding on Donut's answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a set data structure (since the in operator is O(n) on a list but O(1) on a set).

扩展Donut的答案和这里的其他答案,通过使用生成器理解而不是列表理解,以及使用set数据结构(因为in操作符在列表上是O(n),而在集合上是O(1)),您可以得到更好的结果。

So here's a function that would work for you:

这是一个对你有用的函数:

def filter_list(full_list, excludes):
    s = set(excludes)
    return (x for x in full_list if x not in s)

The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a len() on the result), then you can easily build a list like so:

结果将是一个迭代,它将延迟地获取过滤后的列表。如果您需要一个真正的列表对象(例如,如果您需要对结果进行len()),那么您可以轻松构建这样的列表:

filtered_list = list(filter_list(full_list, excludes))

#4


23  

Use the Python set type. That would be the most Pythonic. :)

使用Python集类型。这是最复杂的。:)

Also, since it's native, it should be the most optimized method too.

而且,由于它是本地的,所以它也应该是最优化的方法。

See:

看到的:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/stdtypes.html集

http://docs.python.org/library/sets.htm (for older python)

http://docs.python.org/library/sets.htm python(旧)

# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2

#5


10  

As an alternative, you may also use filter with the lambda expression to get the desired result. For example:

作为一种替代方法,您还可以使用lambda表达式的过滤器来获得所需的结果。例如:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  `filter` returns the a iterator object. Here I'm type-casting 
#     v  it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]

Performance Comparison

性能比较

Here I am comparing the performance of all the answers mentioned here. As expected, Arkku's set based operation is fastest.

这里我比较了这里提到的所有答案的性能。正如预期的那样,Arkku基于集合的操作是最快的。

  • Arkku's Set Difference - First (0.124 usec per loop)

    Arkku的Set Difference - First(每循环0.124 usec)

    mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
    10000000 loops, best of 3: 0.124 usec per loop
    
  • Daniel Pryden's List Comprehension with set lookup - Second (0.302 usec per loop)

    Daniel Pryden对集合查找的列表理解——第二(每个循环0.302 usec)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.302 usec per loop
    
  • Donut's List Comprehension on plain list - Third (0.552 usec per loop)

    Donut在普通列表上的列表理解——第三(每循环0.552 usec)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.552 usec per loop
    
  • Moinuddin Quadri's using filter - Fourth (0.972 usec per loop)

    Moinuddin Quadri使用filter - four(每循环0.972 usec)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
    1000000 loops, best of 3: 0.972 usec per loop
    
  • Akshay Hazari's using combination of reduce + filter - Fifth (3.97 usec per loop)

    Akshay Hazari使用reduce + filter - Fifth (3.97 usec / loop)组合

    mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
    100000 loops, best of 3: 3.97 usec per loop
    

PS: set do not maintain the order and remove the duplicate elements from the list. Hence, do not use set difference if you need any of these.

设置不维护订单并从列表中删除重复的元素。因此,如果您需要这些,不要使用set difference。

#6


7  

Alternate Solution :

替代解决方案:

reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])