熊猫到D3。将数据帧序列化为JSON

I have a DataFrame with the following columns and no duplicates:

我有一个包含以下列的DataFrame，没有重复项：

['region', 'type', 'name', 'value']

that can be seen as a hierarchy as follows

可以看作层次结构如下

grouped = df.groupby(['region','type', 'name'])

I would like to serialize this hierarchy as a JSON object.

我想将此层次结构序列化为JSON对象。

If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a JSON file.

如果有人感兴趣，那么这背后的动机是最终将像这样的可视化组合在一起，这需要一个JSON文件。

To do so, I need to convert grouped into the following:

为此，我需要将分组转换为以下内容：

new_data['children'][i]['name'] = region
new_data['children'][i]['children'][j]['name'] = type
new_data['children'][i]['children'][j]'children'][k]['name'] = name
new_data['children'][i]['children'][j]'children'][k]['size'] = value
...

where region, type, name correspond to different levels of the hierarchy (indexed by i, j and k)

其中region，type，name对应于层次结构的不同级别（由i，j和k索引）

Is there an easy way in Pandas/Python to do this?

在Pandas / Python中有一种简单的方法可以做到这一点吗？

2 个解决方案

#1

Something along these lines might get you there.

这些方面的东西可能会让你到那里。

from collections import defaultdict

tree = lambda: defaultdict(tree)  # a recursive defaultdict
d = tree()
for _, (region, type, name, value) in df.iterrows():
    d['children'][region]['name'] = region
    ...

json.dumps(d)

A vectorized solution would be better, and maybe something that takes advantage of the speed of groupby, but I can't think of such a solution.

矢量化解决方案会更好，也许可以利用groupby的速度，但我想不出这样的解决方案。

Also take a look at df.groupby(...).groups, which return a dict.

另请查看df.groupby（...）。groups，它们返回一个dict。

#2

Here's another script to take a pandas df and output a flare.json file: https://github.com/andrewheekin/csv2flare.json

这是另一个获取pandas df并输出flare.json文件的脚本：https：//github.com/andrewheekin/csv2flare.json

#1