I have 2 pandas dataframes:
我有两只熊猫dataframes:
dept = pd.DataFrame({'dep_id': [1,2], 'dep_name':['shoes', 'giraffes']})
emp = pd.DataFrame({'dep_id': [1,1,2], 'emp_name': ['joe', 'bo', 'gigi']})
joining them duplicates dept
rows for every row in emp
, as customary in relational joins:
按照关系联接的惯例,将emp中的每一行重复dept行:
pd.merge(dept, emp, on = 'dep_id')
dep_id dep_name emp_name
0 1 shoes joe
1 1 shoes bo
2 2 giraffes gigi
instead, I would like to create a hierarchical JSON: e.g.
相反,我想创建一个分层的JSON:例如。
[
{ dep_name: 'shoes', emps: [{emp_name: 'joe'}, {emp_name: 'bo'}]},
{ dep_name: 'giraffes', emps: [{emp_name: 'gigi'}]}
]
what's an elegant way to do it? I can join and then groupby, but then its impossible to tell which columns go to the outer dep and which go to the emps...
有什么优雅的方法吗?我可以加入,也可以加入groupby,但这样就不可能知道哪些列是到外部dep的,哪些是到emps的……
1 个解决方案
#1
1
One possible solution is define columns to emps
list of DataFrames in apply
:
一种可能的解决方案是将列定义为emps中应用的DataFrames列表:
d = (pd.merge(dept, emp, on = 'dep_id')
.groupby('dep_name').apply(lambda x: x[['emp_name']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
dep_name emps
0 giraffes [{'emp_name': 'gigi'}]
1 shoes [{'emp_name': 'joe'}, {'emp_name': 'bo'}]
j = d.to_json(orient='records')
print (j)
[{"dep_name":"giraffes","emps":[{"emp_name":"gigi"}]},
{"dep_name":"shoes","emps":[{"emp_name":"joe"},{"emp_name":"bo"}]}]
d = (pd.merge(dept, emp, on = 'dep_id')
.groupby('dep_name').apply(lambda x: x[['emp_name', 'dep_id']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
dep_name emps
0 giraffes [{'dep_id': 2, 'emp_name': 'gigi'}]
1 shoes [{'dep_id': 1, 'emp_name': 'joe'}, {'dep_id': ...
j = d.to_json(orient='records')
print (j)
[{"dep_name":"giraffes","emps":[{"dep_id":2,"emp_name":"gigi"}]},
{"dep_name":"shoes","emps":[{"dep_id":1,"emp_name":"joe"},{"dep_id":1,"emp_name":"bo"}]}]
EDIT1:
EDIT1:
I think for all columns converted out of nested json need:
我认为对于所有从嵌套json转换而来的列都需要:
dept = pd.DataFrame({'dep_id': [1,2], 'dep_name':['shoes', 'giraffes'], 'def_size':[4,5]})
emp = pd.DataFrame({'dep_id': [1,1,2], 'emp_name': ['joe', 'bo', 'gigi']})
df = pd.merge(dept, emp, on = 'dep_id')
#single columns def_size and dep_name
d = (df.groupby(['def_size','dep_name']).apply(lambda x: x[['emp_name']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
def_size dep_name emps
0 4 shoes [{'emp_name': 'joe'}, {'emp_name': 'bo'}]
1 5 giraffes [{'emp_name': 'gigi'}]
j = d.to_json(orient='records')
print (j)
[{"def_size":4,"dep_name":"shoes","emps":[{"emp_name":"joe"},{"emp_name":"bo"}]},
{"def_size":5,"dep_name":"giraffes","emps":[{"emp_name":"gigi"}]}]
#1
1
One possible solution is define columns to emps
list of DataFrames in apply
:
一种可能的解决方案是将列定义为emps中应用的DataFrames列表:
d = (pd.merge(dept, emp, on = 'dep_id')
.groupby('dep_name').apply(lambda x: x[['emp_name']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
dep_name emps
0 giraffes [{'emp_name': 'gigi'}]
1 shoes [{'emp_name': 'joe'}, {'emp_name': 'bo'}]
j = d.to_json(orient='records')
print (j)
[{"dep_name":"giraffes","emps":[{"emp_name":"gigi"}]},
{"dep_name":"shoes","emps":[{"emp_name":"joe"},{"emp_name":"bo"}]}]
d = (pd.merge(dept, emp, on = 'dep_id')
.groupby('dep_name').apply(lambda x: x[['emp_name', 'dep_id']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
dep_name emps
0 giraffes [{'dep_id': 2, 'emp_name': 'gigi'}]
1 shoes [{'dep_id': 1, 'emp_name': 'joe'}, {'dep_id': ...
j = d.to_json(orient='records')
print (j)
[{"dep_name":"giraffes","emps":[{"dep_id":2,"emp_name":"gigi"}]},
{"dep_name":"shoes","emps":[{"dep_id":1,"emp_name":"joe"},{"dep_id":1,"emp_name":"bo"}]}]
EDIT1:
EDIT1:
I think for all columns converted out of nested json need:
我认为对于所有从嵌套json转换而来的列都需要:
dept = pd.DataFrame({'dep_id': [1,2], 'dep_name':['shoes', 'giraffes'], 'def_size':[4,5]})
emp = pd.DataFrame({'dep_id': [1,1,2], 'emp_name': ['joe', 'bo', 'gigi']})
df = pd.merge(dept, emp, on = 'dep_id')
#single columns def_size and dep_name
d = (df.groupby(['def_size','dep_name']).apply(lambda x: x[['emp_name']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
def_size dep_name emps
0 4 shoes [{'emp_name': 'joe'}, {'emp_name': 'bo'}]
1 5 giraffes [{'emp_name': 'gigi'}]
j = d.to_json(orient='records')
print (j)
[{"def_size":4,"dep_name":"shoes","emps":[{"emp_name":"joe"},{"emp_name":"bo"}]},
{"def_size":5,"dep_name":"giraffes","emps":[{"emp_name":"gigi"}]}]