如何查找列表中的重复项并使用它们创建另一个列表?

时间:2022-03-24 04:39:54

How can I find the duplicates in a Python list and create another list of the duplicates? The list only contains integers.

如何在Python列表中找到副本并创建另一个副本列表?这个列表只包含整数。

23 个解决方案

#1


296  

To remove duplicates use set(a), to print duplicates - something like

要删除复制,请使用set(a),打印复制-类似的东西

a = [1,2,3,2,1,5,6,5,5,5]import collectionsprint [item for item, count in collections.Counter(a).items() if count > 1]## [1, 2, 5]

Note that Counter is not particularly efficient (timings) and probably an overkill here, set will perform better. This code computes a list of unique elements in the source order:

注意,计数器并不是特别有效(计时),在这里可能是一个超杀,设置将执行得更好。此代码计算源代码中唯一元素的列表:

seen = set()uniq = []for x in a:    if x not in seen:        uniq.append(x)        seen.add(x)

or, more concisely:

或者,更简洁:

seen = set()uniq = [x for x in a if x not in seen and not seen.add(x)]    

I don't recommend the latter style, because it is not obvious what not seen.add(x) is doing (the set add() method always returns None, hence the need for not).

我不推荐使用后一种样式,因为不太明显的是,add(x)正在做什么(set add()方法总是返回None,因此需要not)。

To compute the list of duplicated elements without libraries,

要计算没有库的重复元素列表,

seen = {}dupes = []for x in a:    if x not in seen:        seen[x] = 1    else:        if seen[x] == 1:            dupes.append(x)        seen[x] += 1

If list elements are not hashable, you cannot use set/dicts and have to resort to a quadratic time solution (compare each which each), for example:

如果列表元素不可耐洗,则不能使用set/dicts,必须使用二次时间解决方案(对每个元素进行比较),例如:

a = [ [1], [2], [3], [1], [5], [3] ]no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]print no_dupes # [[1], [2], [3], [5]]dupes = [x for n, x in enumerate(a) if x in a[:n]]print dupes # [[1], [3]]

#2


224  

>>> l = [1,2,3,4,4,5,5,6,1]>>> set([x for x in l if l.count(x) > 1])set([1, 4, 5])

#3


60  

You don't need the count, just whether or not the item was seen before. Adapted that answer to this problem:

你不需要计数,不管之前是否见过。针对这个问题调整了这个答案:

def list_duplicates(seq):  seen = set()  seen_add = seen.add  # adds all elements it doesn't know yet to seen and all other to seen_twice  seen_twice = set( x for x in seq if x in seen or seen_add(x) )  # turn the set into a list (as requested)  return list( seen_twice )a = [1,2,3,2,1,5,6,5,5,5]list_duplicates(a) # yields [1, 2, 5]

Just in case speed matters, here are some timings:

为了以防速度的问题,这里有一些时间安排:

# file: test.pyimport collectionsdef thg435(l):    return [x for x, y in collections.Counter(l).items() if y > 1]def moooeeeep(l):    seen = set()    seen_add = seen.add    # adds all elements it doesn't know yet to seen and all other to seen_twice    seen_twice = set( x for x in l if x in seen or seen_add(x) )    # turn the set into a list (as requested)    return list( seen_twice )def RiteshKumar(l):    return list(set([x for x in l if l.count(x) > 1]))def JohnLaRooy(L):    seen = set()    seen2 = set()    seen_add = seen.add    seen2_add = seen2.add    for item in L:        if item in seen:            seen2_add(item)        else:            seen_add(item)    return list(seen2)l = [1,2,3,2,1,5,6,5,5,5]*100

Here are the results: (well done @JohnLaRooy!)

结果如下:(干得好@JohnLaRooy!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'10000 loops, best of 3: 74.6 usec per loop$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'10000 loops, best of 3: 91.3 usec per loop$ python -mtimeit -s 'import test' 'test.thg435(test.l)'1000 loops, best of 3: 266 usec per loop$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'100 loops, best of 3: 8.35 msec per loop

Interestingly, besides the timings itself, also the ranking slightly changes when pypy is used. Most interestingly, the Counter-based approach benefits hugely from pypy's optimizations, whereas the method caching approach I have suggested seems to have almost no effect.

有趣的是,除了计时本身之外,使用pypy时排名也略有变化。最有趣的是,基于对策的方法从pypy的优化中获益良多,而我所建议的方法缓存方法似乎几乎没有任何效果。

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'100000 loops, best of 3: 17.8 usec per loop$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'10000 loops, best of 3: 23 usec per loop$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'10000 loops, best of 3: 39.3 usec per loop

Apparantly this effect is related to the "duplicatedness" of the input data. I have set l = [random.randrange(1000000) for i in xrange(10000)] and got these results:

显然,这种效应与输入数据的“重复”有关。我在xrange(10000)中设置了l = [random.randrange(1000000)],得到了这些结果:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'1000 loops, best of 3: 495 usec per loop$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'1000 loops, best of 3: 499 usec per loop$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'1000 loops, best of 3: 1.68 msec per loop

#4


20  

I came across this question whilst looking in to something related - and wonder why no-one offered a generator based solution? Solving this problem would be:

我在查阅相关资料时遇到了这个问题,我想知道为什么没有人提供基于生成器的解决方案?解决这个问题将是:

>>> print list(getDupes_9([1,2,3,2,1,5,6,5,5,5]))[1, 2, 5]

I was concerned with scalability, so tested several approaches, including naive items that work well on small lists, but scale horribly as lists get larger (note- would have been better to use timeit, but this is illustrative).

我关注可伸缩性,因此测试了几种方法,包括在小列表上工作得很好、但随着列表变得更大而扩展得很糟糕的朴素的项目(注意——使用timeit会更好,但这是说明性的)。

I included @moooeeeep for comparison (it is impressively fast: fastest if the input list is completely random) and an itertools approach that is even faster again for mostly sorted lists... Now includes pandas approach from @firelynx -- slow, but not horribly so, and simple. Note - sort/tee/zip approach is consistently fastest on my machine for large mostly ordered lists, moooeeeep is fastest for shuffled lists, but your mileage may vary.

我包括了@moooeeeep进行比较(它非常快:如果输入列表完全是随机的,那么速度最快)和迭代工具方法,对于大多数排序的列表来说速度更快……现在包含了来自@firelynx的熊猫方法——缓慢,但并不可怕,而且简单。注意:在我的机器上,排序/tee/zip方法在大多数有序列表中是最快的,moooeeeep是最快速的洗牌列表,但是你的里程可能会有所不同。

Advantages

优势

  • very quick simple to test for 'any' duplicates using the same code
  • 非常简单,可以使用相同的代码测试“任何”重复

Assumptions

假设

  • Duplicates should be reported once only
  • 重复应该只报告一次
  • Duplicate order does not need to be preserved
  • 重复订单不需要保留
  • Duplicate might be anywhere in the list
  • 副本可能在列表中的任何地方

Fastest solution, 1m entries:

最快的解决方案,1 m的条目:

def getDupes(c):        '''sort/tee/izip'''        a, b = itertools.tee(sorted(c))        next(b, None)        r = None        for k, g in itertools.izip(a, b):            if k != g: continue            if k != r:                yield k                r = k

Approaches tested

方法测试

import itertoolsimport timeimport randomdef getDupes_1(c):    '''naive'''    for i in xrange(0, len(c)):        if c[i] in c[:i]:            yield c[i]def getDupes_2(c):    '''set len change'''    s = set()    for i in c:        l = len(s)        s.add(i)        if len(s) == l:            yield idef getDupes_3(c):    '''in dict'''    d = {}    for i in c:        if i in d:            if d[i]:                yield i                d[i] = False        else:            d[i] = Truedef getDupes_4(c):    '''in set'''    s,r = set(),set()    for i in c:        if i not in s:            s.add(i)        elif i not in r:            r.add(i)            yield idef getDupes_5(c):    '''sort/adjacent'''    c = sorted(c)    r = None    for i in xrange(1, len(c)):        if c[i] == c[i - 1]:            if c[i] != r:                yield c[i]                r = c[i]def getDupes_6(c):    '''sort/groupby'''    def multiple(x):        try:            x.next()            x.next()            return True        except:            return False    for k, g in itertools.ifilter(lambda x: multiple(x[1]), itertools.groupby(sorted(c))):        yield kdef getDupes_7(c):    '''sort/zip'''    c = sorted(c)    r = None    for k, g in zip(c[:-1],c[1:]):        if k == g:            if k != r:                yield k                r = kdef getDupes_8(c):    '''sort/izip'''    c = sorted(c)    r = None    for k, g in itertools.izip(c[:-1],c[1:]):        if k == g:            if k != r:                yield k                r = kdef getDupes_9(c):    '''sort/tee/izip'''    a, b = itertools.tee(sorted(c))    next(b, None)    r = None    for k, g in itertools.izip(a, b):        if k != g: continue        if k != r:            yield k            r = kdef getDupes_a(l):    '''moooeeeep'''    seen = set()    seen_add = seen.add    # adds all elements it doesn't know yet to seen and all other to seen_twice    for x in l:        if x in seen or seen_add(x):            yield xdef getDupes_b(x):    '''iter*/sorted'''    x = sorted(x)    def _matches():        for k,g in itertools.izip(x[:-1],x[1:]):            if k == g:                yield k    for k, n in itertools.groupby(_matches()):        yield kdef getDupes_c(a):    '''pandas'''    import pandas as pd    vc = pd.Series(a).value_counts()    i = vc[vc > 1].index    for _ in i:        yield _def hasDupes(fn,c):    try:        if fn(c).next(): return True    # Found a dupe    except StopIteration:        pass    return Falsedef getDupes(fn,c):    return list(fn(c))STABLE = Trueif STABLE:    print 'Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array'else:    print 'Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array'for location in (50,250000,500000,750000,999999):    for test in (getDupes_2, getDupes_3, getDupes_4, getDupes_5, getDupes_6,                 getDupes_8, getDupes_9, getDupes_a, getDupes_b, getDupes_c):        print 'Test %-15s:%10d - '%(test.__doc__ or test.__name__,location),        deltas = []        for FIRST in (True,False):            for i in xrange(0, 5):                c = range(0,1000000)                if STABLE:                    c[0] = location                else:                    c.append(location)                    random.shuffle(c)                start = time.time()                if FIRST:                    print '.' if location == test(c).next() else '!',                else:                    print '.' if [location] == list(test(c)) else '!',                deltas.append(time.time()-start)            print ' -- %0.3f  '%(sum(deltas)/len(deltas)),        print    print

The results for the 'all dupes' test were consistent, finding "first" duplicate then "all" duplicates in this array:

“所有dupes”测试的结果是一致的,在这个数组中查找“first”duplicate,然后“all”duplicate:

Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element arrayTest set len change :    500000 -  . . . . .  -- 0.264   . . . . .  -- 0.402  Test in dict        :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.250  Test in set         :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.249  Test sort/adjacent  :    500000 -  . . . . .  -- 0.159   . . . . .  -- 0.229  Test sort/groupby   :    500000 -  . . . . .  -- 0.860   . . . . .  -- 1.286  Test sort/izip      :    500000 -  . . . . .  -- 0.165   . . . . .  -- 0.229  Test sort/tee/izip  :    500000 -  . . . . .  -- 0.145   . . . . .  -- 0.206  *Test moooeeeep      :    500000 -  . . . . .  -- 0.149   . . . . .  -- 0.232  Test iter*/sorted   :    500000 -  . . . . .  -- 0.160   . . . . .  -- 0.221  Test pandas         :    500000 -  . . . . .  -- 0.493   . . . . .  -- 0.499  

When the lists are shuffled first, the price of the sort becomes apparent - the efficiency drops noticeably and the @moooeeeep approach dominates, with set & dict approaches being similar but lessor performers:

当首先对列表进行排序时,排序的代价变得明显——效率显著下降,@moooeeeep方法占主导地位,set & dict方法类似但较弱的执行者:

Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element arrayTest set len change :    500000 -  . . . . .  -- 0.321   . . . . .  -- 0.473  Test in dict        :    500000 -  . . . . .  -- 0.285   . . . . .  -- 0.360  Test in set         :    500000 -  . . . . .  -- 0.309   . . . . .  -- 0.365  Test sort/adjacent  :    500000 -  . . . . .  -- 0.756   . . . . .  -- 0.823  Test sort/groupby   :    500000 -  . . . . .  -- 1.459   . . . . .  -- 1.896  Test sort/izip      :    500000 -  . . . . .  -- 0.786   . . . . .  -- 0.845  Test sort/tee/izip  :    500000 -  . . . . .  -- 0.743   . . . . .  -- 0.804  Test moooeeeep      :    500000 -  . . . . .  -- 0.234   . . . . .  -- 0.311  *Test iter*/sorted   :    500000 -  . . . . .  -- 0.776   . . . . .  -- 0.840  Test pandas         :    500000 -  . . . . .  -- 0.539   . . . . .  -- 0.540  

#5


9  

collections.Counter is new in python 2.7:

集合。计数器在python 2.7中是新的:

Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2a = [1,2,3,2,1,5,6,5,5,5]import collectionsprint [x for x, y in collections.Counter(a).items() if y > 1]Type "help", "copyright", "credits" or "license" for more information.  File "", line 1, in AttributeError: 'module' object has no attribute 'Counter'>>> 

In an earlier version you can use a conventional dict instead:

在早期版本中,你可以使用传统的法令:

a = [1,2,3,2,1,5,6,5,5,5]d = {}for elem in a:    if elem in d:        d[elem] += 1    else:        d[elem] = 1print [x for x, y in d.items() if y > 1]

#6


6  

Using pandas:

使用熊猫:

>>> import pandas as pd>>> a = [1, 2, 1, 3, 3, 3, 0]>>> pd.Series(a)[pd.Series(a).duplicated()].valuesarray([1, 3, 3])

#7


5  

Here's a neat and concise solution -

这里有一个简洁明了的解决方案。

for x in set(li):    li.remove(x)li = list(set(li))

#8


4  

I would do this with pandas, because I use pandas a lot

我会用熊猫做这个,因为我经常用熊猫

import pandas as pda = [1,2,3,3,3,4,5,6,6,7]vc = pd.Series(a).value_counts()vc[vc > 1].index.tolist()

Gives

给了

[3,6]

Probably isn't very efficient, but it sure is less code than a lot of the other answers, so I thought I would contribute

可能不是很有效,但是它的代码比其他很多答案都要少,所以我想我应该贡献。

#9


3  

How about simply loop through each element in the list by checking the number of occurrences, then adding them to a set which will then print the duplicates. Hope this helps someone out there.

简单地循环遍历列表中的每个元素,检查出现的次数,然后将它们添加到一个集合中,然后打印副本。希望这能帮助别人。

myList  = [2 ,4 , 6, 8, 4, 6, 12];newList = set()for i in myList:    if myList.count(i) >= 2:        newList.add(i)print(list(newList))## [4 , 6]

#10


2  

A bit late, but maybe helpful for some.For a largish list, I found this worked for me.

有点晚了,但可能对一些人有帮助。对于一个较大的列表,我发现这对我很有效。

l=[1,2,3,5,4,1,3,1]s=set(l)d=[]for x in l:    if x in s:        s.remove(x)    else:        d.append(x)d[1,3,1]

Shows just and all duplicates and preserves order.

显示所有的重复和保存顺序。

#11


2  

the third example of the accepted answer give an erroneous answer and does not attempt to give duplicates. Here is the correct version :

第三个被接受的答案的例子给出了一个错误的答案,并且没有试图给出重复的答案。以下是正确的版本:

number_lst = [1, 1, 2, 3, 5, ...]seen_set = set()duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))unique_set = seen_set - duplicate_set

#12


2  

Very simple and quick way of finding dupes with one iteration in Python is:

在Python中找到具有一个迭代的dupes的非常简单和快速的方法是:

testList = ['red', 'blue', 'red', 'green', 'blue', 'blue']testListDict = {}for item in testList:  try:    testListDict[item] += 1  except:    testListDict[item] = 1print testListDict

Output will be as follows:

输出如下:

>>> print testListDict{'blue': 3, 'green': 1, 'red': 2}

This and more in my blog http://www.howtoprogramwithpython.com

在我的博客http://www.howtoprogramwithpython.com上还有更多

#13


2  

You can use iteration_utilities.duplicates:

您可以使用iteration_utilities.duplicates:

>>> from iteration_utilities import duplicates>>> list(duplicates([1,1,2,1,2,3,4,2]))[1, 1, 2, 2]

or if you only want one of each duplicate this can be combined with iteration_utilities.unique_everseen:

或者,如果您只想要一个副本,可以与iteration_utility .unique_everseen结合:

>>> from iteration_utilities import unique_everseen>>> list(unique_everseen(duplicates([1,1,2,1,2,3,4,2])))[1, 2]

1 This is from a third-party library I have written: iteration_utilities.

这是我编写的第三方库:iteration_utilities。

#14


1  

We can use itertools.groupby in order to find all the items that have dups:

出现我们可以使用itertools。groupby用于查找所有有dups的项目:

from itertools import groupbymyList  = [2, 4, 6, 8, 4, 6, 12]# when the list is sorted, groupby groups by consecutive elements which are similarfor x, y in groupby(sorted(myList)):    #  list(y) returns all the occurences of item x    if len(list(y)) > 1:        print x  

The output will be:

的输出将会是:

46

#15


1  

Without converting to list and probably the simplest way would be something like below.This may be useful during a interview when they ask not to use sets

如果不转换为list,可能最简单的方式是如下所示。在面试中,当他们要求不要使用集合时,这可能是有用的

    a=[1,2,3,3,3]    dup=[]    for each in a:      if each not in dup:        dup.append(each)    print(dup)

#16


0  

list2 = [1, 2, 3, 4, 1, 2, 3]lset = set()[(lset.add(item), list2.append(item)) for item in list2 if item not in lset]print list(lset)

#17


0  

There are a lot of answers up here, but I think this is relatively a very readable and easy to understand approach:

这里有很多答案,但我认为这是一个相对易读和容易理解的方法:

def get_duplicates(sorted_list):    duplicates = []    last = sorted_list[0]    for x in sorted_list[1:]:        if x == last:            duplicates.append(x)        last = x    return set(duplicates)

Notes:

注:

  • If you wish to preserve duplication count, get rid of the castto 'set' at the bottom to get the full list
  • 如果您希望保留重复计数,请删除底部的castto“set”以获得完整的列表
  • If you prefer to use generators, replace duplicates.append(x) with yield x and the return statement at the bottom (you can cast to set later)
  • 如果您更喜欢使用生成器,则可以使用yield x和底部的return语句(您可以在以后设置)

#18


0  

Here's a fast generator that uses a dict to store each element as a key with a boolean value for checking if the duplicate item has already been yielded.

这里有一个快速生成器,它使用一个命令将每个元素存储为一个带布尔值的键,用于检查是否已经生成了重复项。

For lists with all elements that are hashable types:

对于所有元素都是可耐洗类型的列表:

def gen_dupes(array):    unique = {}    for value in array:        if value in unique and unique[value]:            unique[value] = False            yield value        else:            unique[value] = Truearray = [1, 2, 2, 3, 4, 1, 5, 2, 6, 6]print(list(gen_dupes(array)))# => [2, 1, 6]

For lists that might contain lists:

可能包含列表的列表:

def gen_dupes(array):    unique = {}    for value in array:        is_list = False        if type(value) is list:            value = tuple(value)            is_list = True        if value in unique and unique[value]:            unique[value] = False            if is_list:                value = list(value)            yield value        else:            unique[value] = Truearray = [1, 2, 2, [1, 2], 3, 4, [1, 2], 5, 2, 6, 6]print(list(gen_dupes(array)))# => [2, [1, 2], 6]

#19


0  

def removeduplicates(a):  seen = set()  for i in a:    if i not in seen:      seen.add(i)  return seen print(removeduplicates([1,1,2,2]))

#20


0  

Some other tests. Of course to do...

其他一些测试。当然做……

set([x for x in l if l.count(x) > 1])

...is too costly. It about 500 of times faster (the more long array gives better results) to use the next final method:

…太昂贵了。大约500倍的速度(更长的数组给出更好的结果)使用下一个最终的方法:

def dups_count_dict(l):    d = {}    for item in l:        if item not in d:            d[item] = 0        d[item] += 1    result_d = {key: val for key, val in d.iteritems() if val > 1}    return result_d.keys()

Only 2 loops, no very costly l.count() operations.

只有两个循环,没有非常昂贵的l.count()操作。

Here is a code to compare the methods for example. The code is below, here is the output:

这里有一个比较方法的代码。代码如下,输出如下:

dups_count: 13.368s # this is a function which uses l.count()dups_count_dict: 0.014s # this is a final best function (of the 3 functions)dups_count_counter: 0.024s # collections.Counter

The testing code:

测试代码:

import numpy as npfrom time import timefrom collections import Counterclass TimerCounter(object):    def __init__(self):        self._time_sum = 0    def start(self):        self.time = time()    def stop(self):        self._time_sum += time() - self.time    def get_time_sum(self):        return self._time_sumdef dups_count(l):    return set([x for x in l if l.count(x) > 1])def dups_count_dict(l):    d = {}    for item in l:        if item not in d:            d[item] = 0        d[item] += 1    result_d = {key: val for key, val in d.iteritems() if val > 1}    return result_d.keys()def dups_counter(l):    counter = Counter(l)        result_d = {key: val for key, val in counter.iteritems() if val > 1}    return result_d.keys()def gen_array():    np.random.seed(17)    return list(np.random.randint(0, 5000, 10000))def assert_equal_results(*results):    primary_result = results[0]    other_results = results[1:]    for other_result in other_results:        assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)if __name__ == '__main__':    dups_count_time = TimerCounter()    dups_count_dict_time = TimerCounter()    dups_count_counter = TimerCounter()    l = gen_array()    for i in range(3):        dups_count_time.start()        result1 = dups_count(l)        dups_count_time.stop()        dups_count_dict_time.start()        result2 = dups_count_dict(l)        dups_count_dict_time.stop()        dups_count_counter.start()        result3 = dups_counter(l)        dups_count_counter.stop()        assert_equal_results(result1, result2, result3)    print 'dups_count: %.3f' % dups_count_time.get_time_sum()    print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()    print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()

#21


-1  

One line solution:

一行的解决方案:

set([i for i in list if sum([1 for a in list if a == i]) > 1])

#22


-1  

this is the way I had to do it because I challenged myself not to use other methods:

这是我必须要做的,因为我要求自己不要使用其他方法:

def dupList(oldlist):    if type(oldlist)==type((2,2)):        oldlist=[x for x in oldlist]    newList=[]    newList=newList+oldlist    oldlist=oldlist    forbidden=[]    checkPoint=0    for i in range(len(oldlist)):        #print 'start i', i        if i in forbidden:            continue        else:            for j in range(len(oldlist)):                #print 'start j', j                if j in forbidden:                    continue                else:                    #print 'after Else'                    if i!=j:                         #print 'i,j', i,j                        #print oldlist                        #print newList                        if oldlist[j]==oldlist[i]:                            #print 'oldlist[i],oldlist[j]', oldlist[i],oldlist[j]                            forbidden.append(j)                            #print 'forbidden', forbidden                            del newList[j-checkPoint]                            #print newList                            checkPoint=checkPoint+1    return newList

so your sample works as:

所以你的样本是这样的:

>>>a = [1,2,3,3,3,4,5,6,6,7]>>>dupList(a)[1, 2, 3, 4, 5, 6, 7]

#23


-4  

Use the sort() function. Duplicates can be identified by looping over it and checking l1[i] == l1[i+1].

使用sort()函数。重复可以通过在其上循环和检查l1[i] = l1[i+1]来识别。

#1


296  

To remove duplicates use set(a), to print duplicates - something like

要删除复制,请使用set(a),打印复制-类似的东西

a = [1,2,3,2,1,5,6,5,5,5]import collectionsprint [item for item, count in collections.Counter(a).items() if count > 1]## [1, 2, 5]

Note that Counter is not particularly efficient (timings) and probably an overkill here, set will perform better. This code computes a list of unique elements in the source order:

注意,计数器并不是特别有效(计时),在这里可能是一个超杀,设置将执行得更好。此代码计算源代码中唯一元素的列表:

seen = set()uniq = []for x in a:    if x not in seen:        uniq.append(x)        seen.add(x)

or, more concisely:

或者,更简洁:

seen = set()uniq = [x for x in a if x not in seen and not seen.add(x)]    

I don't recommend the latter style, because it is not obvious what not seen.add(x) is doing (the set add() method always returns None, hence the need for not).

我不推荐使用后一种样式,因为不太明显的是,add(x)正在做什么(set add()方法总是返回None,因此需要not)。

To compute the list of duplicated elements without libraries,

要计算没有库的重复元素列表,

seen = {}dupes = []for x in a:    if x not in seen:        seen[x] = 1    else:        if seen[x] == 1:            dupes.append(x)        seen[x] += 1

If list elements are not hashable, you cannot use set/dicts and have to resort to a quadratic time solution (compare each which each), for example:

如果列表元素不可耐洗,则不能使用set/dicts,必须使用二次时间解决方案(对每个元素进行比较),例如:

a = [ [1], [2], [3], [1], [5], [3] ]no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]print no_dupes # [[1], [2], [3], [5]]dupes = [x for n, x in enumerate(a) if x in a[:n]]print dupes # [[1], [3]]

#2


224  

>>> l = [1,2,3,4,4,5,5,6,1]>>> set([x for x in l if l.count(x) > 1])set([1, 4, 5])

#3


60  

You don't need the count, just whether or not the item was seen before. Adapted that answer to this problem:

你不需要计数,不管之前是否见过。针对这个问题调整了这个答案:

def list_duplicates(seq):  seen = set()  seen_add = seen.add  # adds all elements it doesn't know yet to seen and all other to seen_twice  seen_twice = set( x for x in seq if x in seen or seen_add(x) )  # turn the set into a list (as requested)  return list( seen_twice )a = [1,2,3,2,1,5,6,5,5,5]list_duplicates(a) # yields [1, 2, 5]

Just in case speed matters, here are some timings:

为了以防速度的问题,这里有一些时间安排:

# file: test.pyimport collectionsdef thg435(l):    return [x for x, y in collections.Counter(l).items() if y > 1]def moooeeeep(l):    seen = set()    seen_add = seen.add    # adds all elements it doesn't know yet to seen and all other to seen_twice    seen_twice = set( x for x in l if x in seen or seen_add(x) )    # turn the set into a list (as requested)    return list( seen_twice )def RiteshKumar(l):    return list(set([x for x in l if l.count(x) > 1]))def JohnLaRooy(L):    seen = set()    seen2 = set()    seen_add = seen.add    seen2_add = seen2.add    for item in L:        if item in seen:            seen2_add(item)        else:            seen_add(item)    return list(seen2)l = [1,2,3,2,1,5,6,5,5,5]*100

Here are the results: (well done @JohnLaRooy!)

结果如下:(干得好@JohnLaRooy!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'10000 loops, best of 3: 74.6 usec per loop$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'10000 loops, best of 3: 91.3 usec per loop$ python -mtimeit -s 'import test' 'test.thg435(test.l)'1000 loops, best of 3: 266 usec per loop$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'100 loops, best of 3: 8.35 msec per loop

Interestingly, besides the timings itself, also the ranking slightly changes when pypy is used. Most interestingly, the Counter-based approach benefits hugely from pypy's optimizations, whereas the method caching approach I have suggested seems to have almost no effect.

有趣的是,除了计时本身之外,使用pypy时排名也略有变化。最有趣的是,基于对策的方法从pypy的优化中获益良多,而我所建议的方法缓存方法似乎几乎没有任何效果。

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'100000 loops, best of 3: 17.8 usec per loop$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'10000 loops, best of 3: 23 usec per loop$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'10000 loops, best of 3: 39.3 usec per loop

Apparantly this effect is related to the "duplicatedness" of the input data. I have set l = [random.randrange(1000000) for i in xrange(10000)] and got these results:

显然,这种效应与输入数据的“重复”有关。我在xrange(10000)中设置了l = [random.randrange(1000000)],得到了这些结果:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'1000 loops, best of 3: 495 usec per loop$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'1000 loops, best of 3: 499 usec per loop$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'1000 loops, best of 3: 1.68 msec per loop

#4


20  

I came across this question whilst looking in to something related - and wonder why no-one offered a generator based solution? Solving this problem would be:

我在查阅相关资料时遇到了这个问题,我想知道为什么没有人提供基于生成器的解决方案?解决这个问题将是:

>>> print list(getDupes_9([1,2,3,2,1,5,6,5,5,5]))[1, 2, 5]

I was concerned with scalability, so tested several approaches, including naive items that work well on small lists, but scale horribly as lists get larger (note- would have been better to use timeit, but this is illustrative).

我关注可伸缩性,因此测试了几种方法,包括在小列表上工作得很好、但随着列表变得更大而扩展得很糟糕的朴素的项目(注意——使用timeit会更好,但这是说明性的)。

I included @moooeeeep for comparison (it is impressively fast: fastest if the input list is completely random) and an itertools approach that is even faster again for mostly sorted lists... Now includes pandas approach from @firelynx -- slow, but not horribly so, and simple. Note - sort/tee/zip approach is consistently fastest on my machine for large mostly ordered lists, moooeeeep is fastest for shuffled lists, but your mileage may vary.

我包括了@moooeeeep进行比较(它非常快:如果输入列表完全是随机的,那么速度最快)和迭代工具方法,对于大多数排序的列表来说速度更快……现在包含了来自@firelynx的熊猫方法——缓慢,但并不可怕,而且简单。注意:在我的机器上,排序/tee/zip方法在大多数有序列表中是最快的,moooeeeep是最快速的洗牌列表,但是你的里程可能会有所不同。

Advantages

优势

  • very quick simple to test for 'any' duplicates using the same code
  • 非常简单,可以使用相同的代码测试“任何”重复

Assumptions

假设

  • Duplicates should be reported once only
  • 重复应该只报告一次
  • Duplicate order does not need to be preserved
  • 重复订单不需要保留
  • Duplicate might be anywhere in the list
  • 副本可能在列表中的任何地方

Fastest solution, 1m entries:

最快的解决方案,1 m的条目:

def getDupes(c):        '''sort/tee/izip'''        a, b = itertools.tee(sorted(c))        next(b, None)        r = None        for k, g in itertools.izip(a, b):            if k != g: continue            if k != r:                yield k                r = k

Approaches tested

方法测试

import itertoolsimport timeimport randomdef getDupes_1(c):    '''naive'''    for i in xrange(0, len(c)):        if c[i] in c[:i]:            yield c[i]def getDupes_2(c):    '''set len change'''    s = set()    for i in c:        l = len(s)        s.add(i)        if len(s) == l:            yield idef getDupes_3(c):    '''in dict'''    d = {}    for i in c:        if i in d:            if d[i]:                yield i                d[i] = False        else:            d[i] = Truedef getDupes_4(c):    '''in set'''    s,r = set(),set()    for i in c:        if i not in s:            s.add(i)        elif i not in r:            r.add(i)            yield idef getDupes_5(c):    '''sort/adjacent'''    c = sorted(c)    r = None    for i in xrange(1, len(c)):        if c[i] == c[i - 1]:            if c[i] != r:                yield c[i]                r = c[i]def getDupes_6(c):    '''sort/groupby'''    def multiple(x):        try:            x.next()            x.next()            return True        except:            return False    for k, g in itertools.ifilter(lambda x: multiple(x[1]), itertools.groupby(sorted(c))):        yield kdef getDupes_7(c):    '''sort/zip'''    c = sorted(c)    r = None    for k, g in zip(c[:-1],c[1:]):        if k == g:            if k != r:                yield k                r = kdef getDupes_8(c):    '''sort/izip'''    c = sorted(c)    r = None    for k, g in itertools.izip(c[:-1],c[1:]):        if k == g:            if k != r:                yield k                r = kdef getDupes_9(c):    '''sort/tee/izip'''    a, b = itertools.tee(sorted(c))    next(b, None)    r = None    for k, g in itertools.izip(a, b):        if k != g: continue        if k != r:            yield k            r = kdef getDupes_a(l):    '''moooeeeep'''    seen = set()    seen_add = seen.add    # adds all elements it doesn't know yet to seen and all other to seen_twice    for x in l:        if x in seen or seen_add(x):            yield xdef getDupes_b(x):    '''iter*/sorted'''    x = sorted(x)    def _matches():        for k,g in itertools.izip(x[:-1],x[1:]):            if k == g:                yield k    for k, n in itertools.groupby(_matches()):        yield kdef getDupes_c(a):    '''pandas'''    import pandas as pd    vc = pd.Series(a).value_counts()    i = vc[vc > 1].index    for _ in i:        yield _def hasDupes(fn,c):    try:        if fn(c).next(): return True    # Found a dupe    except StopIteration:        pass    return Falsedef getDupes(fn,c):    return list(fn(c))STABLE = Trueif STABLE:    print 'Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array'else:    print 'Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array'for location in (50,250000,500000,750000,999999):    for test in (getDupes_2, getDupes_3, getDupes_4, getDupes_5, getDupes_6,                 getDupes_8, getDupes_9, getDupes_a, getDupes_b, getDupes_c):        print 'Test %-15s:%10d - '%(test.__doc__ or test.__name__,location),        deltas = []        for FIRST in (True,False):            for i in xrange(0, 5):                c = range(0,1000000)                if STABLE:                    c[0] = location                else:                    c.append(location)                    random.shuffle(c)                start = time.time()                if FIRST:                    print '.' if location == test(c).next() else '!',                else:                    print '.' if [location] == list(test(c)) else '!',                deltas.append(time.time()-start)            print ' -- %0.3f  '%(sum(deltas)/len(deltas)),        print    print

The results for the 'all dupes' test were consistent, finding "first" duplicate then "all" duplicates in this array:

“所有dupes”测试的结果是一致的,在这个数组中查找“first”duplicate,然后“all”duplicate:

Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element arrayTest set len change :    500000 -  . . . . .  -- 0.264   . . . . .  -- 0.402  Test in dict        :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.250  Test in set         :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.249  Test sort/adjacent  :    500000 -  . . . . .  -- 0.159   . . . . .  -- 0.229  Test sort/groupby   :    500000 -  . . . . .  -- 0.860   . . . . .  -- 1.286  Test sort/izip      :    500000 -  . . . . .  -- 0.165   . . . . .  -- 0.229  Test sort/tee/izip  :    500000 -  . . . . .  -- 0.145   . . . . .  -- 0.206  *Test moooeeeep      :    500000 -  . . . . .  -- 0.149   . . . . .  -- 0.232  Test iter*/sorted   :    500000 -  . . . . .  -- 0.160   . . . . .  -- 0.221  Test pandas         :    500000 -  . . . . .  -- 0.493   . . . . .  -- 0.499  

When the lists are shuffled first, the price of the sort becomes apparent - the efficiency drops noticeably and the @moooeeeep approach dominates, with set & dict approaches being similar but lessor performers:

当首先对列表进行排序时,排序的代价变得明显——效率显著下降,@moooeeeep方法占主导地位,set & dict方法类似但较弱的执行者:

Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element arrayTest set len change :    500000 -  . . . . .  -- 0.321   . . . . .  -- 0.473  Test in dict        :    500000 -  . . . . .  -- 0.285   . . . . .  -- 0.360  Test in set         :    500000 -  . . . . .  -- 0.309   . . . . .  -- 0.365  Test sort/adjacent  :    500000 -  . . . . .  -- 0.756   . . . . .  -- 0.823  Test sort/groupby   :    500000 -  . . . . .  -- 1.459   . . . . .  -- 1.896  Test sort/izip      :    500000 -  . . . . .  -- 0.786   . . . . .  -- 0.845  Test sort/tee/izip  :    500000 -  . . . . .  -- 0.743   . . . . .  -- 0.804  Test moooeeeep      :    500000 -  . . . . .  -- 0.234   . . . . .  -- 0.311  *Test iter*/sorted   :    500000 -  . . . . .  -- 0.776   . . . . .  -- 0.840  Test pandas         :    500000 -  . . . . .  -- 0.539   . . . . .  -- 0.540  

#5


9  

collections.Counter is new in python 2.7:

集合。计数器在python 2.7中是新的:

Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2a = [1,2,3,2,1,5,6,5,5,5]import collectionsprint [x for x, y in collections.Counter(a).items() if y > 1]Type "help", "copyright", "credits" or "license" for more information.  File "", line 1, in AttributeError: 'module' object has no attribute 'Counter'>>> 

In an earlier version you can use a conventional dict instead:

在早期版本中,你可以使用传统的法令:

a = [1,2,3,2,1,5,6,5,5,5]d = {}for elem in a:    if elem in d:        d[elem] += 1    else:        d[elem] = 1print [x for x, y in d.items() if y > 1]

#6


6  

Using pandas:

使用熊猫:

>>> import pandas as pd>>> a = [1, 2, 1, 3, 3, 3, 0]>>> pd.Series(a)[pd.Series(a).duplicated()].valuesarray([1, 3, 3])

#7


5  

Here's a neat and concise solution -

这里有一个简洁明了的解决方案。

for x in set(li):    li.remove(x)li = list(set(li))

#8


4  

I would do this with pandas, because I use pandas a lot

我会用熊猫做这个,因为我经常用熊猫

import pandas as pda = [1,2,3,3,3,4,5,6,6,7]vc = pd.Series(a).value_counts()vc[vc > 1].index.tolist()

Gives

给了

[3,6]

Probably isn't very efficient, but it sure is less code than a lot of the other answers, so I thought I would contribute

可能不是很有效,但是它的代码比其他很多答案都要少,所以我想我应该贡献。

#9


3  

How about simply loop through each element in the list by checking the number of occurrences, then adding them to a set which will then print the duplicates. Hope this helps someone out there.

简单地循环遍历列表中的每个元素,检查出现的次数,然后将它们添加到一个集合中,然后打印副本。希望这能帮助别人。

myList  = [2 ,4 , 6, 8, 4, 6, 12];newList = set()for i in myList:    if myList.count(i) >= 2:        newList.add(i)print(list(newList))## [4 , 6]

#10


2  

A bit late, but maybe helpful for some.For a largish list, I found this worked for me.

有点晚了,但可能对一些人有帮助。对于一个较大的列表,我发现这对我很有效。

l=[1,2,3,5,4,1,3,1]s=set(l)d=[]for x in l:    if x in s:        s.remove(x)    else:        d.append(x)d[1,3,1]

Shows just and all duplicates and preserves order.

显示所有的重复和保存顺序。

#11


2  

the third example of the accepted answer give an erroneous answer and does not attempt to give duplicates. Here is the correct version :

第三个被接受的答案的例子给出了一个错误的答案,并且没有试图给出重复的答案。以下是正确的版本:

number_lst = [1, 1, 2, 3, 5, ...]seen_set = set()duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))unique_set = seen_set - duplicate_set

#12


2  

Very simple and quick way of finding dupes with one iteration in Python is:

在Python中找到具有一个迭代的dupes的非常简单和快速的方法是:

testList = ['red', 'blue', 'red', 'green', 'blue', 'blue']testListDict = {}for item in testList:  try:    testListDict[item] += 1  except:    testListDict[item] = 1print testListDict

Output will be as follows:

输出如下:

>>> print testListDict{'blue': 3, 'green': 1, 'red': 2}

This and more in my blog http://www.howtoprogramwithpython.com

在我的博客http://www.howtoprogramwithpython.com上还有更多

#13


2  

You can use iteration_utilities.duplicates:

您可以使用iteration_utilities.duplicates:

>>> from iteration_utilities import duplicates>>> list(duplicates([1,1,2,1,2,3,4,2]))[1, 1, 2, 2]

or if you only want one of each duplicate this can be combined with iteration_utilities.unique_everseen:

或者,如果您只想要一个副本,可以与iteration_utility .unique_everseen结合:

>>> from iteration_utilities import unique_everseen>>> list(unique_everseen(duplicates([1,1,2,1,2,3,4,2])))[1, 2]

1 This is from a third-party library I have written: iteration_utilities.

这是我编写的第三方库:iteration_utilities。

#14


1  

We can use itertools.groupby in order to find all the items that have dups:

出现我们可以使用itertools。groupby用于查找所有有dups的项目:

from itertools import groupbymyList  = [2, 4, 6, 8, 4, 6, 12]# when the list is sorted, groupby groups by consecutive elements which are similarfor x, y in groupby(sorted(myList)):    #  list(y) returns all the occurences of item x    if len(list(y)) > 1:        print x  

The output will be:

的输出将会是:

46

#15


1  

Without converting to list and probably the simplest way would be something like below.This may be useful during a interview when they ask not to use sets

如果不转换为list,可能最简单的方式是如下所示。在面试中,当他们要求不要使用集合时,这可能是有用的

    a=[1,2,3,3,3]    dup=[]    for each in a:      if each not in dup:        dup.append(each)    print(dup)

#16


0  

list2 = [1, 2, 3, 4, 1, 2, 3]lset = set()[(lset.add(item), list2.append(item)) for item in list2 if item not in lset]print list(lset)

#17


0  

There are a lot of answers up here, but I think this is relatively a very readable and easy to understand approach:

这里有很多答案,但我认为这是一个相对易读和容易理解的方法:

def get_duplicates(sorted_list):    duplicates = []    last = sorted_list[0]    for x in sorted_list[1:]:        if x == last:            duplicates.append(x)        last = x    return set(duplicates)

Notes:

注:

  • If you wish to preserve duplication count, get rid of the castto 'set' at the bottom to get the full list
  • 如果您希望保留重复计数,请删除底部的castto“set”以获得完整的列表
  • If you prefer to use generators, replace duplicates.append(x) with yield x and the return statement at the bottom (you can cast to set later)
  • 如果您更喜欢使用生成器,则可以使用yield x和底部的return语句(您可以在以后设置)

#18


0  

Here's a fast generator that uses a dict to store each element as a key with a boolean value for checking if the duplicate item has already been yielded.

这里有一个快速生成器,它使用一个命令将每个元素存储为一个带布尔值的键,用于检查是否已经生成了重复项。

For lists with all elements that are hashable types:

对于所有元素都是可耐洗类型的列表:

def gen_dupes(array):    unique = {}    for value in array:        if value in unique and unique[value]:            unique[value] = False            yield value        else:            unique[value] = Truearray = [1, 2, 2, 3, 4, 1, 5, 2, 6, 6]print(list(gen_dupes(array)))# => [2, 1, 6]

For lists that might contain lists:

可能包含列表的列表:

def gen_dupes(array):    unique = {}    for value in array:        is_list = False        if type(value) is list:            value = tuple(value)            is_list = True        if value in unique and unique[value]:            unique[value] = False            if is_list:                value = list(value)            yield value        else:            unique[value] = Truearray = [1, 2, 2, [1, 2], 3, 4, [1, 2], 5, 2, 6, 6]print(list(gen_dupes(array)))# => [2, [1, 2], 6]

#19


0  

def removeduplicates(a):  seen = set()  for i in a:    if i not in seen:      seen.add(i)  return seen print(removeduplicates([1,1,2,2]))

#20


0  

Some other tests. Of course to do...

其他一些测试。当然做……

set([x for x in l if l.count(x) > 1])

...is too costly. It about 500 of times faster (the more long array gives better results) to use the next final method:

…太昂贵了。大约500倍的速度(更长的数组给出更好的结果)使用下一个最终的方法:

def dups_count_dict(l):    d = {}    for item in l:        if item not in d:            d[item] = 0        d[item] += 1    result_d = {key: val for key, val in d.iteritems() if val > 1}    return result_d.keys()

Only 2 loops, no very costly l.count() operations.

只有两个循环,没有非常昂贵的l.count()操作。

Here is a code to compare the methods for example. The code is below, here is the output:

这里有一个比较方法的代码。代码如下,输出如下:

dups_count: 13.368s # this is a function which uses l.count()dups_count_dict: 0.014s # this is a final best function (of the 3 functions)dups_count_counter: 0.024s # collections.Counter

The testing code:

测试代码:

import numpy as npfrom time import timefrom collections import Counterclass TimerCounter(object):    def __init__(self):        self._time_sum = 0    def start(self):        self.time = time()    def stop(self):        self._time_sum += time() - self.time    def get_time_sum(self):        return self._time_sumdef dups_count(l):    return set([x for x in l if l.count(x) > 1])def dups_count_dict(l):    d = {}    for item in l:        if item not in d:            d[item] = 0        d[item] += 1    result_d = {key: val for key, val in d.iteritems() if val > 1}    return result_d.keys()def dups_counter(l):    counter = Counter(l)        result_d = {key: val for key, val in counter.iteritems() if val > 1}    return result_d.keys()def gen_array():    np.random.seed(17)    return list(np.random.randint(0, 5000, 10000))def assert_equal_results(*results):    primary_result = results[0]    other_results = results[1:]    for other_result in other_results:        assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)if __name__ == '__main__':    dups_count_time = TimerCounter()    dups_count_dict_time = TimerCounter()    dups_count_counter = TimerCounter()    l = gen_array()    for i in range(3):        dups_count_time.start()        result1 = dups_count(l)        dups_count_time.stop()        dups_count_dict_time.start()        result2 = dups_count_dict(l)        dups_count_dict_time.stop()        dups_count_counter.start()        result3 = dups_counter(l)        dups_count_counter.stop()        assert_equal_results(result1, result2, result3)    print 'dups_count: %.3f' % dups_count_time.get_time_sum()    print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()    print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()

#21


-1  

One line solution:

一行的解决方案:

set([i for i in list if sum([1 for a in list if a == i]) > 1])

#22


-1  

this is the way I had to do it because I challenged myself not to use other methods:

这是我必须要做的,因为我要求自己不要使用其他方法:

def dupList(oldlist):    if type(oldlist)==type((2,2)):        oldlist=[x for x in oldlist]    newList=[]    newList=newList+oldlist    oldlist=oldlist    forbidden=[]    checkPoint=0    for i in range(len(oldlist)):        #print 'start i', i        if i in forbidden:            continue        else:            for j in range(len(oldlist)):                #print 'start j', j                if j in forbidden:                    continue                else:                    #print 'after Else'                    if i!=j:                         #print 'i,j', i,j                        #print oldlist                        #print newList                        if oldlist[j]==oldlist[i]:                            #print 'oldlist[i],oldlist[j]', oldlist[i],oldlist[j]                            forbidden.append(j)                            #print 'forbidden', forbidden                            del newList[j-checkPoint]                            #print newList                            checkPoint=checkPoint+1    return newList

so your sample works as:

所以你的样本是这样的:

>>>a = [1,2,3,3,3,4,5,6,6,7]>>>dupList(a)[1, 2, 3, 4, 5, 6, 7]

#23


-4  

Use the sort() function. Duplicates can be identified by looping over it and checking l1[i] == l1[i+1].

使用sort()函数。重复可以通过在其上循环和检查l1[i] = l1[i+1]来识别。