将多个excel文件导入python pandas并将它们连接成一个数据帧

时间:2021-05-19 15:48:48

I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe. I have not been able to figure it out though. I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:


import sys
import csv
import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files\excelfiles'
filenames = glob.glob(path + "/*.xlsx")

dfs = []

for df in dfs: 
    xl_file = pd.ExcelFile(filenames)
    dfs.concat(df, ignore_index=True)

2 个解决方案



As mentioned in the comments, one error you are making is that you are looping over an empty list.


Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.


(1) Imports:


import os
import pandas as pd

(2) List files:


path = os.getcwd()
files = os.listdir(path)



 'Screen Shot 2013-12-28 at 7.15.45 PM.png',
 'test1 2.xls',
 'test1 3.xls',
 'test1 4.xls',
 'test1 5.xls',
 'Werewolf Modelling',
 '~$Random Numbers.xlsx']

(3) Pick out 'xls' files:


files_xls = [f for f in files if f[-3:] == 'xls']



['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']

(4) Initialize empty dataframe:


df = pd.DataFrame()

(5) Loop over list of files to append to empty dataframe:


for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

(6) Enjoy your new dataframe. :-)

(6)享受您的新数据框。 :-)




  Result  Sample
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10



this works with python 2.x

这适用于python 2.x.

be in the directory where the Excel files are


see http://pbpython.com/excel-file-combine.html


import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')



As mentioned in the comments, one error you are making is that you are looping over an empty list.


Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.


(1) Imports:


import os
import pandas as pd

(2) List files:


path = os.getcwd()
files = os.listdir(path)



 'Screen Shot 2013-12-28 at 7.15.45 PM.png',
 'test1 2.xls',
 'test1 3.xls',
 'test1 4.xls',
 'test1 5.xls',
 'Werewolf Modelling',
 '~$Random Numbers.xlsx']

(3) Pick out 'xls' files:


files_xls = [f for f in files if f[-3:] == 'xls']



['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']

(4) Initialize empty dataframe:


df = pd.DataFrame()

(5) Loop over list of files to append to empty dataframe:


for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

(6) Enjoy your new dataframe. :-)

(6)享受您的新数据框。 :-)




  Result  Sample
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10



this works with python 2.x

这适用于python 2.x.

be in the directory where the Excel files are


see http://pbpython.com/excel-file-combine.html


import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')