I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe. I have not been able to figure it out though. I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:
我想从目录中读取几个excel文件到pandas并将它们连接成一个大数据帧。我虽然无法弄明白。我需要一些关于for循环和构建连接数据帧的帮助:这是我到目前为止所拥有的:
import sys
import csv
import glob
import pandas as pd
# get data file names
path =r'C:\DRO\DCL_rawdata_files\excelfiles'
filenames = glob.glob(path + "/*.xlsx")
dfs = []
for df in dfs:
xl_file = pd.ExcelFile(filenames)
df=xl_file.parse('Sheet1')
dfs.concat(df, ignore_index=True)
2 个解决方案
#1
20
As mentioned in the comments, one error you are making is that you are looping over an empty list.
正如评论中所提到的,您正在犯的一个错误是您正在循环一个空列表。
Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.
这是我将如何做到这一点,使用一个接一个地附加5个相同的Excel文件的例子。
(1) Imports:
(1)进口:
import os
import pandas as pd
(2) List files:
(2)列表文件:
path = os.getcwd()
files = os.listdir(path)
files
Output:
输出:
['.DS_Store',
'.ipynb_checkpoints',
'.localized',
'Screen Shot 2013-12-28 at 7.15.45 PM.png',
'test1 2.xls',
'test1 3.xls',
'test1 4.xls',
'test1 5.xls',
'test1.xls',
'Untitled0.ipynb',
'Werewolf Modelling',
'~$Random Numbers.xlsx']
(3) Pick out 'xls' files:
(3)选出'xls'文件:
files_xls = [f for f in files if f[-3:] == 'xls']
files_xls
Output:
输出:
['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']
(4) Initialize empty dataframe:
(4)初始化空数据帧:
df = pd.DataFrame()
(5) Loop over list of files to append to empty dataframe:
(5)循环到要附加到空数据帧的文件列表:
for f in files_xls:
data = pd.read_excel(f, 'Sheet1')
df = df.append(data)
(6) Enjoy your new dataframe. :-)
(6)享受您的新数据框。 :-)
df
Output:
输出:
Result Sample
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
#2
1
this works with python 2.x
这适用于python 2.x.
be in the directory where the Excel files are
位于Excel文件所在的目录中
see http://pbpython.com/excel-file-combine.html
请参阅http://pbpython.com/excel-file-combine.html
import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()
#1
20
As mentioned in the comments, one error you are making is that you are looping over an empty list.
正如评论中所提到的,您正在犯的一个错误是您正在循环一个空列表。
Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.
这是我将如何做到这一点,使用一个接一个地附加5个相同的Excel文件的例子。
(1) Imports:
(1)进口:
import os
import pandas as pd
(2) List files:
(2)列表文件:
path = os.getcwd()
files = os.listdir(path)
files
Output:
输出:
['.DS_Store',
'.ipynb_checkpoints',
'.localized',
'Screen Shot 2013-12-28 at 7.15.45 PM.png',
'test1 2.xls',
'test1 3.xls',
'test1 4.xls',
'test1 5.xls',
'test1.xls',
'Untitled0.ipynb',
'Werewolf Modelling',
'~$Random Numbers.xlsx']
(3) Pick out 'xls' files:
(3)选出'xls'文件:
files_xls = [f for f in files if f[-3:] == 'xls']
files_xls
Output:
输出:
['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']
(4) Initialize empty dataframe:
(4)初始化空数据帧:
df = pd.DataFrame()
(5) Loop over list of files to append to empty dataframe:
(5)循环到要附加到空数据帧的文件列表:
for f in files_xls:
data = pd.read_excel(f, 'Sheet1')
df = df.append(data)
(6) Enjoy your new dataframe. :-)
(6)享受您的新数据框。 :-)
df
Output:
输出:
Result Sample
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
#2
1
this works with python 2.x
这适用于python 2.x.
be in the directory where the Excel files are
位于Excel文件所在的目录中
see http://pbpython.com/excel-file-combine.html
请参阅http://pbpython.com/excel-file-combine.html
import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()