为什么在附加pandas数据帧时列顺序正在改变？

I want to append (merge) all the csv files in a folder using Python pandas.

我想使用Python pandas将所有csv文件附加（合并）到一个文件夹中。

For example: Say folder has two csv files test1.csv and test2.csv as follows:

例如：Say文件夹有两个csv文件test1.csv和test2.csv，如下所示：

A_Id    P_Id    CN1         CN2         CN3
AAA     111     702         709         740
BBB     222     1727        1734        1778

and

和

A_Id    P_Id    CN1         CN2         CN3
CCC     333     710        750          750
DDD     444     180        734          778

So the python script I wrote was as follows:

所以我写的python脚本如下：

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)

all_data.to_csv('testfolder/combined.csv')

Though the combined.csv seems to have all the appended rows, it looks as follows:

虽然combined.csv似乎有所有附加的行，但它看起来如下：

      CN1       CN2         CN3    A_Id    P_Id
  0   710      750         750     CCC     333
  1   180       734         778     DDD     444     
  0   702       709         740     AAA     111
  1  1727       1734        1778    BBB     222

Where as it should look like this:

它应该如下所示：

A_ID   P_Id   CN1    CN2    CN2
AAA    111    702    709    740
BBB    222    1727   1734   1778
CCC    333    110    356    123
DDD    444    220    256    223

Why are the first two columns moved to the end?
为什么前两列移到了最后？
Why is it appending in the first line rather than at the last line?
为什么它会附加在第一行而不是最后一行？

What am I missing? And how can I get get of 0s and 1s in the first column?

我错过了什么？如何在第一列中获得0和1？

P.S: Since these are large csv files, I thought of using pandas.

P.S：由于这些是大型csv文件，我想到了使用pandas。

3 个解决方案

#1

Try this .....

尝试这个 .....

all_data = all_data.append(df)[df.columns.tolist()]

#2

I had the same issue and it was painfull. I managed to solve it by reorganising columns based on source dataframe after it was appended to final dataframe. It would look like this:

我有同样的问题，这很痛苦。我设法通过在追加到最终数据帧之后根据源数据帧重新组织列来解决它。它看起来像这样：

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)
    all_data = all_data[df.columns]

all_data.to_csv('testfolder/combined.csv')

Since your issue was from almost two years ago, I'm posting solution which worked for me for enyone else who will also face similar issue.

由于您的问题来自差不多两年前，我发布的解决方案对我来说也适用于其他也将面临类似问题的人。

#3

I tweaked the code as below. Comments in-line.

我调整了如下代码。在线评论。

#!/usr/bin/python
import pandas as pd
import glob

# Grab all the csv files in the folder to a list.
fileList = glob.glob('input_folder/*.csv')

#Initialize an empty dataframe to grab the csv content.
all_data = pd.DataFrame()

#Initialize an empty list to grab the dataframes.
dfList= []

for files in  fileList:
    df =  pd.read_csv(files, index_col = None, header= False)
    dfList.append(df)

#The frames will be in reverse order i.e last read file's content in the begining. So reverse it again
Reversed_dfList =  dfList[::-1]
CombinedFrame =  pd.concat(Reversed_dfList)

# The "Combined.csv" file will have combination of all the files.
CombinedFrame.to_csv('output_folder/Combined.csv', index=False)

#1