I started a loop to generates dataframe from json in a folder.
我开始循环从文件夹中的json生成数据帧。
for filename in os.listdir('json1'):
with open(os.path.join('json1',filename),'r') as json_data:
d=json.load(json_data)
df2=pd.io.json.json_normalize(d)
df2.columns = df2.columns.map(lambda x: x.split(".")[-1])
df3=pd.io.json.json_normalize(d['Reviews'])
df3.columns = df3.columns.map(lambda x: x.split(".")[-1])
df4=pd.concat([df2]*df3.shape[0],ignore_index=True)
df5=df4.join(df3)
print(df5)
The result that I print contains the dataframe that generated for each json file in the folder. However, I am wondering how can I combine all of these dataframe into a single big dataframe. They all have similar columns head but may slightly different.
我打印的结果包含为文件夹中的每个json文件生成的数据帧。但是,我想知道如何将所有这些数据帧组合成一个大数据帧。他们都有相似的列头,但可能略有不同。
1 个解决方案
#1
0
Try the following approach:
尝试以下方法:
def my_read_json(filename, **kwargs):
# ...
return df5
df = pd.concat([my_read_json(f) for f in files], ignore_index=True)
#1
0
Try the following approach:
尝试以下方法:
def my_read_json(filename, **kwargs):
# ...
return df5
df = pd.concat([my_read_json(f) for f in files], ignore_index=True)