I want to append (merge) all the csv files in a folder using Python pandas.
我想使用Python pandas将所有csv文件附加(合并)到一个文件夹中。
For example: Say folder has two csv files test1.csv
and test2.csv
as follows:
例如:Say文件夹有两个csv文件test1.csv和test2.csv,如下所示:
A_Id P_Id CN1 CN2 CN3
AAA 111 702 709 740
BBB 222 1727 1734 1778
and
和
A_Id P_Id CN1 CN2 CN3
CCC 333 710 750 750
DDD 444 180 734 778
So the python script I wrote was as follows:
所以我写的python脚本如下:
#!/usr/bin/python
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
df = pd.read_csv(f)
all_data = all_data.append(df)
all_data.to_csv('testfolder/combined.csv')
Though the combined.csv
seems to have all the appended rows, it looks as follows:
虽然combined.csv似乎有所有附加的行,但它看起来如下:
CN1 CN2 CN3 A_Id P_Id
0 710 750 750 CCC 333
1 180 734 778 DDD 444
0 702 709 740 AAA 111
1 1727 1734 1778 BBB 222
Where as it should look like this:
它应该如下所示:
A_ID P_Id CN1 CN2 CN2
AAA 111 702 709 740
BBB 222 1727 1734 1778
CCC 333 110 356 123
DDD 444 220 256 223
- Why are the first two columns moved to the end?
- 为什么前两列移到了最后?
- Why is it appending in the first line rather than at the last line?
- 为什么它会附加在第一行而不是最后一行?
What am I missing? And how can I get get of 0s and 1s in the first column?
我错过了什么?如何在第一列中获得0和1?
P.S: Since these are large csv files, I thought of using pandas.
P.S:由于这些是大型csv文件,我想到了使用pandas。
3 个解决方案
#1
10
Try this .....
尝试这个 .....
all_data = all_data.append(df)[df.columns.tolist()]
#2
2
I had the same issue and it was painfull. I managed to solve it by reorganising columns based on source dataframe after it was appended to final dataframe. It would look like this:
我有同样的问题,这很痛苦。我设法通过在追加到最终数据帧之后根据源数据帧重新组织列来解决它。它看起来像这样:
#!/usr/bin/python
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
df = pd.read_csv(f)
all_data = all_data.append(df)
all_data = all_data[df.columns]
all_data.to_csv('testfolder/combined.csv')
Since your issue was from almost two years ago, I'm posting solution which worked for me for enyone else who will also face similar issue.
由于您的问题来自差不多两年前,我发布的解决方案对我来说也适用于其他也将面临类似问题的人。
#3
0
I tweaked the code as below. Comments in-line.
我调整了如下代码。在线评论。
#!/usr/bin/python
import pandas as pd
import glob
# Grab all the csv files in the folder to a list.
fileList = glob.glob('input_folder/*.csv')
#Initialize an empty dataframe to grab the csv content.
all_data = pd.DataFrame()
#Initialize an empty list to grab the dataframes.
dfList= []
for files in fileList:
df = pd.read_csv(files, index_col = None, header= False)
dfList.append(df)
#The frames will be in reverse order i.e last read file's content in the begining. So reverse it again
Reversed_dfList = dfList[::-1]
CombinedFrame = pd.concat(Reversed_dfList)
# The "Combined.csv" file will have combination of all the files.
CombinedFrame.to_csv('output_folder/Combined.csv', index=False)
#1
10
Try this .....
尝试这个 .....
all_data = all_data.append(df)[df.columns.tolist()]
#2
2
I had the same issue and it was painfull. I managed to solve it by reorganising columns based on source dataframe after it was appended to final dataframe. It would look like this:
我有同样的问题,这很痛苦。我设法通过在追加到最终数据帧之后根据源数据帧重新组织列来解决它。它看起来像这样:
#!/usr/bin/python
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
df = pd.read_csv(f)
all_data = all_data.append(df)
all_data = all_data[df.columns]
all_data.to_csv('testfolder/combined.csv')
Since your issue was from almost two years ago, I'm posting solution which worked for me for enyone else who will also face similar issue.
由于您的问题来自差不多两年前,我发布的解决方案对我来说也适用于其他也将面临类似问题的人。
#3
0
I tweaked the code as below. Comments in-line.
我调整了如下代码。在线评论。
#!/usr/bin/python
import pandas as pd
import glob
# Grab all the csv files in the folder to a list.
fileList = glob.glob('input_folder/*.csv')
#Initialize an empty dataframe to grab the csv content.
all_data = pd.DataFrame()
#Initialize an empty list to grab the dataframes.
dfList= []
for files in fileList:
df = pd.read_csv(files, index_col = None, header= False)
dfList.append(df)
#The frames will be in reverse order i.e last read file's content in the begining. So reverse it again
Reversed_dfList = dfList[::-1]
CombinedFrame = pd.concat(Reversed_dfList)
# The "Combined.csv" file will have combination of all the files.
CombinedFrame.to_csv('output_folder/Combined.csv', index=False)