从python中的列表中删除重复的JSON对象

时间:2022-09-10 20:29:57

I have a list of dict where a particular value is repeated multiple times, and I would like to remove the duplicate values.

我有一个dict列表,其中特定值重复多次,我想删除重复的值。

My list:

我的列表:

te = [
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      }
    ]

function to remove duplicate values:

用于删除重复值的函数:

def removeduplicate(it):
    seen = set()
    for x in it:
        if x not in seen:
            yield x
            seen.add(x)

When I call this function I get generator object.

当我调用这个函数时,我得到了生成器对象。

<generator object removeduplicate at 0x0170B6E8>

When I try to iterate over the generator I get TypeError: unhashable type: 'dict'

当我尝试迭代生成器时,我得到TypeError:unhashable类型:'dict'

Is there a way to remove the duplicate values or to iterate over the generator

有没有办法删除重复值或迭代生成器

3 个解决方案

#1


9  

You can easily remove duplicate keys by dictionary comprehension, since dictionary does not allow duplicate keys, as below-

您可以通过字典理解轻松删除重复键,因为字典不允许重复键,如下所示 -

te = [
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
          "Name": "Bala1",
          "phone": "None"
      }      
    ]

unique = { each['Name'] : each for each in te }.values()

print unique

Output-

输出 -

[{'phone': 'None', 'Name': 'Bala1'}, {'phone': 'None', 'Name': 'Bala'}]

#2


2  

Because you can't add a dict to set. From this question:

因为你无法添加一个dict来设置。从这个问题:

You're trying to use a dict as a key to another dict or in a set. That does not work because the keys have to be hashable.

你正在尝试使用dict作为另一个词典或集合的关键。这不起作用,因为键必须是可清洗的。

As a general rule, only immutable objects (strings, integers, floats, frozensets, tuples of immutables) are hashable (though exceptions are possible).

作为一般规则,只有不可变对象(字符串,整数,浮点数,frozensets,不可变元组)是可以清除的(尽管例外是可能的)。

>>> foo = dict()
>>> bar = set()
>>> bar.add(foo)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> 

Instead, you're already using if x not in seen, so just use a list:

相反,如果没有看到x,你已经在使用了,所以只需使用一个列表:

>>> te = [
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       }
...     ]

>>> def removeduplicate(it):
...     seen = []
...     for x in it:
...         if x not in seen:
...             yield x
...             seen.append(x)

>>> removeduplicate(te)
<generator object removeduplicate at 0x7f3578c71ca8>

>>> list(removeduplicate(te))
[{'phone': 'None', 'Name': 'Bala'}]
>>> 

#3


1  

You can still use a set for duplicate detection, you just need to convert the dictionary into something hashable such as a tuple. Your dictionaries can be converted to tuples by tuple(d.items()) where d is a dictionary. Applying that to your generator function:

您仍然可以使用一组进行重复检测,您只需要将字典转换为可以清除的元素,例如元组。您的词典可以通过元组(d.items())转换为元组,其中d是字典。将其应用于您的生成器功能:

def removeduplicate(it):
    seen = set()
    for x in it:
        t = tuple(x.items())
        if t not in seen:
            yield x
            seen.add(t)

>>> for d in removeduplicate(te):
...    print(d)
{'phone': 'None', 'Name': 'Bala'}

>>> te.append({'Name': 'Bala', 'phone': '1234567890'})
>>> te.append({'Name': 'Someone', 'phone': '1234567890'})

>>> for d in removeduplicate(te):
...    print(d)
{'phone': 'None', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Someone'}

This provides faster lookup (avg. O(1)) than a "seen" list (O(n)). Whether it is worth the extra computation of converting every dict into a tuple depends on the number of dictionaries that you have and how many duplicates there are. If there are a lot of duplicates, a "seen" list will grow quite large, and testing whether a dict has already been seen could become an expensive operation. This might justify the tuple conversion - you would have to test/profile it.

这提供了比“看到”列表(O(n))更快的查找(平均O(1))。是否值得将每个字典转换为元组的额外计算取决于您拥有的字典数量以及有多少重复字典。如果有很多重复项,“看到”列表将会变得非常大,并且测试是否已经看到dict可能会成为一项昂贵的操作。这可能证明元组转换是合理的 - 你必须测试/分析它。

#1


9  

You can easily remove duplicate keys by dictionary comprehension, since dictionary does not allow duplicate keys, as below-

您可以通过字典理解轻松删除重复键,因为字典不允许重复键,如下所示 -

te = [
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
          "Name": "Bala1",
          "phone": "None"
      }      
    ]

unique = { each['Name'] : each for each in te }.values()

print unique

Output-

输出 -

[{'phone': 'None', 'Name': 'Bala1'}, {'phone': 'None', 'Name': 'Bala'}]

#2


2  

Because you can't add a dict to set. From this question:

因为你无法添加一个dict来设置。从这个问题:

You're trying to use a dict as a key to another dict or in a set. That does not work because the keys have to be hashable.

你正在尝试使用dict作为另一个词典或集合的关键。这不起作用,因为键必须是可清洗的。

As a general rule, only immutable objects (strings, integers, floats, frozensets, tuples of immutables) are hashable (though exceptions are possible).

作为一般规则,只有不可变对象(字符串,整数,浮点数,frozensets,不可变元组)是可以清除的(尽管例外是可能的)。

>>> foo = dict()
>>> bar = set()
>>> bar.add(foo)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> 

Instead, you're already using if x not in seen, so just use a list:

相反,如果没有看到x,你已经在使用了,所以只需使用一个列表:

>>> te = [
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       }
...     ]

>>> def removeduplicate(it):
...     seen = []
...     for x in it:
...         if x not in seen:
...             yield x
...             seen.append(x)

>>> removeduplicate(te)
<generator object removeduplicate at 0x7f3578c71ca8>

>>> list(removeduplicate(te))
[{'phone': 'None', 'Name': 'Bala'}]
>>> 

#3


1  

You can still use a set for duplicate detection, you just need to convert the dictionary into something hashable such as a tuple. Your dictionaries can be converted to tuples by tuple(d.items()) where d is a dictionary. Applying that to your generator function:

您仍然可以使用一组进行重复检测,您只需要将字典转换为可以清除的元素,例如元组。您的词典可以通过元组(d.items())转换为元组,其中d是字典。将其应用于您的生成器功能:

def removeduplicate(it):
    seen = set()
    for x in it:
        t = tuple(x.items())
        if t not in seen:
            yield x
            seen.add(t)

>>> for d in removeduplicate(te):
...    print(d)
{'phone': 'None', 'Name': 'Bala'}

>>> te.append({'Name': 'Bala', 'phone': '1234567890'})
>>> te.append({'Name': 'Someone', 'phone': '1234567890'})

>>> for d in removeduplicate(te):
...    print(d)
{'phone': 'None', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Someone'}

This provides faster lookup (avg. O(1)) than a "seen" list (O(n)). Whether it is worth the extra computation of converting every dict into a tuple depends on the number of dictionaries that you have and how many duplicates there are. If there are a lot of duplicates, a "seen" list will grow quite large, and testing whether a dict has already been seen could become an expensive operation. This might justify the tuple conversion - you would have to test/profile it.

这提供了比“看到”列表(O(n))更快的查找(平均O(1))。是否值得将每个字典转换为元组的额外计算取决于您拥有的字典数量以及有多少重复字典。如果有很多重复项,“看到”列表将会变得非常大,并且测试是否已经看到dict可能会成为一项昂贵的操作。这可能证明元组转换是合理的 - 你必须测试/分析它。