Suppose I have an array:
假设我有一个数组:
[['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
And I want a dict
(or JSON):
我需要一个通知(或JSON):
{
'a': {
10: {1: 0.1, 2: 0.2},
20: {2: 0.3}
}
'b': {
10: {1: 0.4},
20: {2: 0.5}
}
}
Is there any good way or some library for this task?
In this example the array is just 4-column, but my original array is more complicated (7-column).
这个任务有什么好的方法或者图书馆吗?在本例中,数组仅为4列,但我的原始数组更复杂(7列)。
Currently I implement this naively:
目前我很天真地实现了这一点:
import pandas as pd
df = pd.DataFrame(array)
grouped1 = df.groupby('column1')
for column1 in grouped1.groups:
group1 = grouped1.get_group(column1)
grouped2 = group1.groupby('column2')
for column2 in grouped2.groups:
group2 = grouped2.get_group(column2)
...
And defaultdict
way:
和defaultdict道:
d = defaultdict(lambda x: defaultdict(lambda y: defaultdict ... ))
for row in array:
d[row[0]][row[1]][row[2]... = row[-1]
But I think neither is smart.
但我认为两者都不聪明。
2 个解决方案
#1
1
Introduction
Here is a recursive solution. The base case is when you have a list of 2-element lists (or tuples), in which case, the dict
will do what we want:
这是一个递归的解决方案。基本情况是当你有一个包含2个元素的列表(或元组)时,在这种情况下,字典会做我们想做的事情:
>>> dict([(1, 0.1), (2, 0.2)])
{1: 0.1, 2: 0.2}
For other cases, we will remove the first column and recurse down until we get to the base case.
对于其他情况,我们将删除第一列并递归到下面,直到到达基本情况。
The code:
from itertools import groupby
def rows2dict(rows):
if len(rows[0]) == 2:
# e.g. [(1, 0.1), (2, 0.2)] ==> {1: 0.1, 2: 0.2}
return dict(rows)
else:
dict_object = dict()
for column1, groupped_rows in groupby(rows, lambda x: x[0]):
rows_without_first_column = [x[1:] for x in groupped_rows]
dict_object[column1] = rows2dict(rows_without_first_column)
return dict_object
if __name__ == '__main__':
rows = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
dict_object = rows2dict(rows)
print dict_object
Output
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
Notes
- We use the
itertools.groupby
generator to simplify grouping of similar rows based on the first column - 我们使用出现的itertools。groupby生成器,以简化基于第一列的相似行的分组。
- For each group of rows, we remove the first column and recurse down
- 对于每组行,我们删除第一列并递归向下
- This solution assumes that the
rows
variable has 2 or more columns. The result is unpreditable for rows which has 0 or 1 column. - 这个解决方案假设行变量有2个或更多的列。结果对于有0或1列的行是不可预ditable的。
#2
4
I would suggest this rather simple solution:
我建议这个相当简单的解决办法:
from functools import reduce
data = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
result = dict()
for row in data:
reduce(lambda v, k: v.setdefault(k, {}), row[:-2], result)[row[-2]] = row[-1]
print(result)
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
An actual recursive solution would be something like this:
一个实际的递归解决方案应该是这样的:
def add_to_group(keys: list, group: dict):
if len(keys) == 2:
group[keys[0]] = keys[1]
else:
add_to_group(keys[1:], group.setdefault(keys[0], dict()))
result = dict()
for row in data:
add_to_group(row, result)
print(result)
#1
1
Introduction
Here is a recursive solution. The base case is when you have a list of 2-element lists (or tuples), in which case, the dict
will do what we want:
这是一个递归的解决方案。基本情况是当你有一个包含2个元素的列表(或元组)时,在这种情况下,字典会做我们想做的事情:
>>> dict([(1, 0.1), (2, 0.2)])
{1: 0.1, 2: 0.2}
For other cases, we will remove the first column and recurse down until we get to the base case.
对于其他情况,我们将删除第一列并递归到下面,直到到达基本情况。
The code:
from itertools import groupby
def rows2dict(rows):
if len(rows[0]) == 2:
# e.g. [(1, 0.1), (2, 0.2)] ==> {1: 0.1, 2: 0.2}
return dict(rows)
else:
dict_object = dict()
for column1, groupped_rows in groupby(rows, lambda x: x[0]):
rows_without_first_column = [x[1:] for x in groupped_rows]
dict_object[column1] = rows2dict(rows_without_first_column)
return dict_object
if __name__ == '__main__':
rows = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
dict_object = rows2dict(rows)
print dict_object
Output
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
Notes
- We use the
itertools.groupby
generator to simplify grouping of similar rows based on the first column - 我们使用出现的itertools。groupby生成器,以简化基于第一列的相似行的分组。
- For each group of rows, we remove the first column and recurse down
- 对于每组行,我们删除第一列并递归向下
- This solution assumes that the
rows
variable has 2 or more columns. The result is unpreditable for rows which has 0 or 1 column. - 这个解决方案假设行变量有2个或更多的列。结果对于有0或1列的行是不可预ditable的。
#2
4
I would suggest this rather simple solution:
我建议这个相当简单的解决办法:
from functools import reduce
data = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
result = dict()
for row in data:
reduce(lambda v, k: v.setdefault(k, {}), row[:-2], result)[row[-2]] = row[-1]
print(result)
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
An actual recursive solution would be something like this:
一个实际的递归解决方案应该是这样的:
def add_to_group(keys: list, group: dict):
if len(keys) == 2:
group[keys[0]] = keys[1]
else:
add_to_group(keys[1:], group.setdefault(keys[0], dict()))
result = dict()
for row in data:
add_to_group(row, result)
print(result)