I have two lists like so
我有两个这样的清单
found = ['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5']
expected = ['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3']
I want to find the differences between both lists.
I have done
我想找到两个列表之间的差异。我已经做好了
list(set(expected)-set(found))
and
list(set(found)-set(expected))
Which returns ['E3']
and ['E5']
respectively.
它分别返回['E3']和['E5']。
However, the answers I need are:
但是,我需要的答案是:
'E3' is missing from found.
'E5' is missing from expected.
There are 2 copies of 'E5' in found.
There are 3 copies of 'E2BS' in found.
There are 2 copies of 'E2' in found.
Any help/suggestions are welcome!
欢迎任何帮助/建议!
3 个解决方案
#1
8
The collections.Counter class will excel at enumerating the differences between multisets:
collections.Counter类将擅长枚举多重集之间的差异:
>>> from collections import Counter
>>> found = Counter(['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5'])
>>> expected = Counter(['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3'])
>>> list((found - expected).elements())
['E2', 'E2BS', 'E2BS', 'E5', 'E5']
>>> list((expected - found).elements())
You might also be interested in difflib.Differ:
您可能也对difflib.Differ感兴趣:
>>> from difflib import Differ
>>> found = ['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5']
>>> expected = ['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3']
>>> for d in Differ().compare(expected, found):
... print(d)
+ CG
+ E6
E1
E2
E4
+ L2
+ E7
+ E5
+ L1
+ E2BS
+ E2BS
+ E2BS
+ E2
E1^E4
+ E5
- E6
- E7
- L1
- L2
- CG
- E2BS
- E3
#2
4
Leverage the Python set
class and Counter
class instead of rolling your own solution:
利用Python集合类和Counter类,而不是滚动自己的解决方案:
-
symmetric_difference
: finds elements that are either in one set or the other, but not both. -
intersection
: finds elements in common with the two sets. -
difference
: which is essentially what you did by subtracting one set from another
symmetric_difference:查找一组或另一组中的元素,但不是两者。
intersection:找到两个集合的共同元素。
差异:这基本上是你从另一个中减去一个集合所做的
Code examples
found.difference(expected) # set(['E5'])
expected.difference(found) # set(['E3'])
found.symmetric_difference(expected) # set(['E5', 'E3'])
-
Finding copies of objects: this question was already referenced. Using that technique gets you all duplicates, and using the resultant
Counter
object, you can find how many duplicates. For example:查找对象的副本:此问题已被引用。使用该技术可以获得所有重复项,并使用生成的Counter对象,您可以找到多少重复项。例如:
collections.Counter(found)['E5'] # 2
found.difference(expected)#set(['E5'])
expected.difference(found)#set(['E3'])
found.symmetric_difference(expected)#set(['E5','E3'])
#3
2
You've already answered the first two:
你已经回答了前两个问题:
print('{0} missing from found'.format(list(set(expected) - set(found)))
print('{0} missing from expected'.format(list(set(found) - set(expected)))
The second two require you to look at counting duplicates in lists, for which there are many solutions to be found online (including this one: Find and list duplicates in a list?).
后两个要求你看一下计算列表中的重复项,有很多解决方案可以在网上找到(包括这一个:在列表中查找和列出重复项?)。
#1
8
The collections.Counter class will excel at enumerating the differences between multisets:
collections.Counter类将擅长枚举多重集之间的差异:
>>> from collections import Counter
>>> found = Counter(['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5'])
>>> expected = Counter(['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3'])
>>> list((found - expected).elements())
['E2', 'E2BS', 'E2BS', 'E5', 'E5']
>>> list((expected - found).elements())
You might also be interested in difflib.Differ:
您可能也对difflib.Differ感兴趣:
>>> from difflib import Differ
>>> found = ['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5']
>>> expected = ['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3']
>>> for d in Differ().compare(expected, found):
... print(d)
+ CG
+ E6
E1
E2
E4
+ L2
+ E7
+ E5
+ L1
+ E2BS
+ E2BS
+ E2BS
+ E2
E1^E4
+ E5
- E6
- E7
- L1
- L2
- CG
- E2BS
- E3
#2
4
Leverage the Python set
class and Counter
class instead of rolling your own solution:
利用Python集合类和Counter类,而不是滚动自己的解决方案:
-
symmetric_difference
: finds elements that are either in one set or the other, but not both. -
intersection
: finds elements in common with the two sets. -
difference
: which is essentially what you did by subtracting one set from another
symmetric_difference:查找一组或另一组中的元素,但不是两者。
intersection:找到两个集合的共同元素。
差异:这基本上是你从另一个中减去一个集合所做的
Code examples
found.difference(expected) # set(['E5'])
expected.difference(found) # set(['E3'])
found.symmetric_difference(expected) # set(['E5', 'E3'])
-
Finding copies of objects: this question was already referenced. Using that technique gets you all duplicates, and using the resultant
Counter
object, you can find how many duplicates. For example:查找对象的副本:此问题已被引用。使用该技术可以获得所有重复项,并使用生成的Counter对象,您可以找到多少重复项。例如:
collections.Counter(found)['E5'] # 2
found.difference(expected)#set(['E5'])
expected.difference(found)#set(['E3'])
found.symmetric_difference(expected)#set(['E5','E3'])
#3
2
You've already answered the first two:
你已经回答了前两个问题:
print('{0} missing from found'.format(list(set(expected) - set(found)))
print('{0} missing from expected'.format(list(set(found) - set(expected)))
The second two require you to look at counting duplicates in lists, for which there are many solutions to be found online (including this one: Find and list duplicates in a list?).
后两个要求你看一下计算列表中的重复项,有很多解决方案可以在网上找到(包括这一个:在列表中查找和列出重复项?)。