Assume I have the following dictionaries:
假设我有以下词典:
{name: "john", place: "nyc", owns: "gold", quantity: 30}
{name: "john", place: "nyc", owns: "silver", quantity: 20}
{name: "jane", place: "nyc", owns: "platinum", quantity: 5}
{name: "john", place: "chicago", owns: "brass", quantity: 60}
{name: "john", place: "chicago", owns: "silver", quantity: 40}
And I have hundreds of these small dictionaries. I have to merge them with a subset of common keys, in this example (name, place) and create a new dictionary. Ultimately, the output should look like the following:
我有数百个这样的小词典。我必须将它们与公共密钥的子集合并,在此示例中(名称,位置)并创建新的字典。最终,输出应如下所示:
{name: "john", place: "nyc", gold: 30, silver: 20}
{name: "jane", place: "nyc", platinum: 5}
{name: "john", place: "chicago", brass: 60, silver: 40}
Is there any efficient way to do this? All I can think of is brute-force, where I will keep track of every possible name-place combination, store in some list, traverse the entire thing again for each combination and merge the dictionaries into a new one. Thanks!
有没有有效的方法来做到这一点?我能想到的只有蛮力,我将跟踪每个可能的名称 - 地点组合,存储在一些列表中,为每个组合再次遍历整个事物并将字典合并为一个新的。谢谢!
5 个解决方案
#1
7
First, getting the output that you asked for:
首先,获取您要求的输出:
data = [{'name': "john", 'place': "nyc", 'owns': "gold", 'quantity': 30},
{'name': "john", 'place': "nyc", 'owns': "silver", 'quantity': 20},
{'name': "jane", 'place': "nyc", 'owns': "platinum", 'quantity': 5},
{'name': "john", 'place': "chicago", 'owns': "brass", 'quantity': 60},
{'name': "john", 'place': "chicago", 'owns': "silver", 'quantity': 40}]
from collections import defaultdict
accumulator = defaultdict(list)
for p in data:
accumulator[p['name'],p['place']].append((p['owns'],p['quantity']))
from itertools import chain
[dict(chain([('name',name), ('place',place)], rest)) for (name,place),rest in accumulator.iteritems()]
Out[13]:
[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]
Now I have to point out that this list-of-dicts data structure you've asked for is super awkward. Dicts are great for lookups, but they perform best when you can just use one for the whole group of objects - if you have to linearly search through a bunch of dicts to find the one you want, you've immediately lost the whole benefit that dict
provides in the first place. So that leaves us with a couple of options. Go one level deeper - nest dict
s within our dict
, or use something else entirely.
现在我必须指出,你要求的这个列表 - 数据结构是非常尴尬的。 Dicts非常适合查找,但是当你可以在一组对象中使用它时它们表现最佳 - 如果你必须线性搜索一堆dicts来找到你想要的那个,你就会立即失去整个利益dict首先提供。所以这给我们留下了几个选择。深入一级 - 在我们的dict中嵌套dicts,或完全使用其他东西。
May I suggest making a list of meaningful objects which each represent one of these people? Either create your own class
, or use a namedtuple
:
我可以建议列出一些有意义的对象,每个对象代表其中一个人吗?要么创建自己的类,要么使用namedtuple:
from collections import namedtuple
Person = namedtuple('Person','name place holdings')
[Person(name, place, dict(rest)) for (name,place), rest in accumulator.iteritems()]
Out[17]:
[Person(name='jane', place='nyc', holdings={'platinum': 5}),
Person(name='john', place='chicago', holdings={'brass': 60, 'silver': 40}),
Person(name='john', place='nyc', holdings={'silver': 20, 'gold': 30})]
#2
1
So my personal strategy for this is roughly outlined below. You should define a key generator given an instance of a dict, and then group it in an isolated dict by that key generated. Once you've iterated through all elements and updated based on the key, then simply return the .values()
of the grouped dict.
因此,我个人的策略概述如下。您应该在给定dict实例的情况下定义密钥生成器,然后通过生成的密钥将其分组到隔离的dict中。一旦迭代完所有元素并根据键更新,则只需返回分组字典的.values()。
dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
def get_key(instance):
return "%s-%s" % (instance.get("name"), instance.get("place"), )
grouped = {}
for dict_ in dicts:
grouped[get_key(dict_)] = grouped.get(get_key(dict_), {})
grouped[get_key(dict_)].update(dict_)
print grouped.values()
# [
# {'owns': 'platinum', 'place': 'nyc', 'name': 'jane', 'quantity': 5},
# {'name': 'john', 'place': 'nyc', 'owns': 'silver', 'quantity': 20},
# {'name': 'john', 'place': 'chicago', 'owns': 'silver', 'quantity': 40}
# ]
#3
0
This is one way to do it:
这是一种方法:
dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
We create a transformed dict with place-name
as key and output dict as the value
我们创建一个转换的dict,其中place-name为键,输出dict为值
transformed_dict = {}
for a_dict in dicts:
key = '{}-{}'.format(a_dict['place'], a_dict['name'])
if key not in transformed_dict:
transformed_dict[key] = {'name': a_dict['name'], 'place': a_dict['place'], a_dict['owns']: a_dict['quantity']}
else:
transformed_dict[key][a_dict['owns']] = a_dict['quantity']
transformed_dict
now looks like:
transformed_dict现在看起来像:
{'chicago-john': {'brass': 60,
'name': 'john',
'place': 'chicago',
'silver': 40},
'nyc-jane': {'name': 'jane', 'place': 'nyc', 'platinum': 5},
'nyc-john': {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}}
pprint(list(transformed_dict.values()))
gives what we want:
pprint(list(transformed_dict.values()))给出了我们想要的东西:
[{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'name': 'jane', 'place': 'nyc', 'platinum': 5}]
#4
0
from itertools import groupby
result, get_owns = [], lambda x: x["owns"]
get_details = lambda x: (x["name"], x["place"])
# Sort and group the data based on name and place
for key, grp in groupby(sorted(data, key=get_details), key=get_details):
# Create a dictionary with the name and place
temp = dict(zip(("name", "place"), key))
# Sort and group the grouped data based on owns
for owns, grp1 in groupby(sorted(grp, key=get_owns), key=get_owns):
# For each material, find and add the sum of quantity in temp
temp[owns] = sum(item["quantity"] for item in grp1)
# Add the temp dictionary to the result :-)
result.append(temp)
print result
Output
产量
[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]
#5
0
May be a crazy idea, but how about a dict-of-dicts-of-dicts? This would work like a 2D array, the row and column indices being the names and places.
这可能是一个疯狂的想法,但是如何直截了当地说明了这个词?这将像2D数组一样工作,行和列索引是名称和位置。
my_dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
all_names = set(d["name"] for d in my_dicts)
all_places = set(d["place"] for d in my_dicts)
merged = {name : {place : {} for place in all_places} for name in all_names}
for d in my_dicts:
merged[d["name"]][d["place"]][d["owns"]] = d["quantity"]
import pprint
pprint.pprint(merged)
# {'jane': {'chicago': {}, 'nyc': {'platinum': 5}},
# 'john': {'chicago': {'brass': 60, 'silver': 40},
# 'nyc': {'gold': 30, 'silver': 20}}}
Then convert to your desired format:
然后转换为您想要的格式:
new_dicts = [{"name" : name, "place" : place} for name in all_names for place in all_places if merged[name][place]]
for d in new_dicts:
d.update(merged[d["name"]][d["place"]])
pprint.pprint(new_dicts)
# [{'name': 'jane', 'place': 'nyc', 'platinum': 5},
# {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
# {'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40}]
#1
7
First, getting the output that you asked for:
首先,获取您要求的输出:
data = [{'name': "john", 'place': "nyc", 'owns': "gold", 'quantity': 30},
{'name': "john", 'place': "nyc", 'owns': "silver", 'quantity': 20},
{'name': "jane", 'place': "nyc", 'owns': "platinum", 'quantity': 5},
{'name': "john", 'place': "chicago", 'owns': "brass", 'quantity': 60},
{'name': "john", 'place': "chicago", 'owns': "silver", 'quantity': 40}]
from collections import defaultdict
accumulator = defaultdict(list)
for p in data:
accumulator[p['name'],p['place']].append((p['owns'],p['quantity']))
from itertools import chain
[dict(chain([('name',name), ('place',place)], rest)) for (name,place),rest in accumulator.iteritems()]
Out[13]:
[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]
Now I have to point out that this list-of-dicts data structure you've asked for is super awkward. Dicts are great for lookups, but they perform best when you can just use one for the whole group of objects - if you have to linearly search through a bunch of dicts to find the one you want, you've immediately lost the whole benefit that dict
provides in the first place. So that leaves us with a couple of options. Go one level deeper - nest dict
s within our dict
, or use something else entirely.
现在我必须指出,你要求的这个列表 - 数据结构是非常尴尬的。 Dicts非常适合查找,但是当你可以在一组对象中使用它时它们表现最佳 - 如果你必须线性搜索一堆dicts来找到你想要的那个,你就会立即失去整个利益dict首先提供。所以这给我们留下了几个选择。深入一级 - 在我们的dict中嵌套dicts,或完全使用其他东西。
May I suggest making a list of meaningful objects which each represent one of these people? Either create your own class
, or use a namedtuple
:
我可以建议列出一些有意义的对象,每个对象代表其中一个人吗?要么创建自己的类,要么使用namedtuple:
from collections import namedtuple
Person = namedtuple('Person','name place holdings')
[Person(name, place, dict(rest)) for (name,place), rest in accumulator.iteritems()]
Out[17]:
[Person(name='jane', place='nyc', holdings={'platinum': 5}),
Person(name='john', place='chicago', holdings={'brass': 60, 'silver': 40}),
Person(name='john', place='nyc', holdings={'silver': 20, 'gold': 30})]
#2
1
So my personal strategy for this is roughly outlined below. You should define a key generator given an instance of a dict, and then group it in an isolated dict by that key generated. Once you've iterated through all elements and updated based on the key, then simply return the .values()
of the grouped dict.
因此,我个人的策略概述如下。您应该在给定dict实例的情况下定义密钥生成器,然后通过生成的密钥将其分组到隔离的dict中。一旦迭代完所有元素并根据键更新,则只需返回分组字典的.values()。
dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
def get_key(instance):
return "%s-%s" % (instance.get("name"), instance.get("place"), )
grouped = {}
for dict_ in dicts:
grouped[get_key(dict_)] = grouped.get(get_key(dict_), {})
grouped[get_key(dict_)].update(dict_)
print grouped.values()
# [
# {'owns': 'platinum', 'place': 'nyc', 'name': 'jane', 'quantity': 5},
# {'name': 'john', 'place': 'nyc', 'owns': 'silver', 'quantity': 20},
# {'name': 'john', 'place': 'chicago', 'owns': 'silver', 'quantity': 40}
# ]
#3
0
This is one way to do it:
这是一种方法:
dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
We create a transformed dict with place-name
as key and output dict as the value
我们创建一个转换的dict,其中place-name为键,输出dict为值
transformed_dict = {}
for a_dict in dicts:
key = '{}-{}'.format(a_dict['place'], a_dict['name'])
if key not in transformed_dict:
transformed_dict[key] = {'name': a_dict['name'], 'place': a_dict['place'], a_dict['owns']: a_dict['quantity']}
else:
transformed_dict[key][a_dict['owns']] = a_dict['quantity']
transformed_dict
now looks like:
transformed_dict现在看起来像:
{'chicago-john': {'brass': 60,
'name': 'john',
'place': 'chicago',
'silver': 40},
'nyc-jane': {'name': 'jane', 'place': 'nyc', 'platinum': 5},
'nyc-john': {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}}
pprint(list(transformed_dict.values()))
gives what we want:
pprint(list(transformed_dict.values()))给出了我们想要的东西:
[{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'name': 'jane', 'place': 'nyc', 'platinum': 5}]
#4
0
from itertools import groupby
result, get_owns = [], lambda x: x["owns"]
get_details = lambda x: (x["name"], x["place"])
# Sort and group the data based on name and place
for key, grp in groupby(sorted(data, key=get_details), key=get_details):
# Create a dictionary with the name and place
temp = dict(zip(("name", "place"), key))
# Sort and group the grouped data based on owns
for owns, grp1 in groupby(sorted(grp, key=get_owns), key=get_owns):
# For each material, find and add the sum of quantity in temp
temp[owns] = sum(item["quantity"] for item in grp1)
# Add the temp dictionary to the result :-)
result.append(temp)
print result
Output
产量
[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]
#5
0
May be a crazy idea, but how about a dict-of-dicts-of-dicts? This would work like a 2D array, the row and column indices being the names and places.
这可能是一个疯狂的想法,但是如何直截了当地说明了这个词?这将像2D数组一样工作,行和列索引是名称和位置。
my_dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
all_names = set(d["name"] for d in my_dicts)
all_places = set(d["place"] for d in my_dicts)
merged = {name : {place : {} for place in all_places} for name in all_names}
for d in my_dicts:
merged[d["name"]][d["place"]][d["owns"]] = d["quantity"]
import pprint
pprint.pprint(merged)
# {'jane': {'chicago': {}, 'nyc': {'platinum': 5}},
# 'john': {'chicago': {'brass': 60, 'silver': 40},
# 'nyc': {'gold': 30, 'silver': 20}}}
Then convert to your desired format:
然后转换为您想要的格式:
new_dicts = [{"name" : name, "place" : place} for name in all_names for place in all_places if merged[name][place]]
for d in new_dicts:
d.update(merged[d["name"]][d["place"]])
pprint.pprint(new_dicts)
# [{'name': 'jane', 'place': 'nyc', 'platinum': 5},
# {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
# {'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40}]