I have a table stored in an Excel file as follows:
我有一个存储在Excel文件中的表,如下所示:
Species Garden Hedgerow Parkland Pasture Woodland Blackbird 47 10 40 2 2 Chaffinch 19 3 5 0 2 Great Tit 50 0 10 7 0 House Sparrow 46 16 8 4 0 Robin 9 3 0 0 2 Song Thrush 4 0 6 0 0
I am using the xlrd
Python library for reading these data. I have no problem reading it into a list of lists (with each line of the table stored as a list), using the code below:
我正在使用xlrd Python库来读取这些数据。我将它读入列表列表(将表的每一行存储为列表)没有问题,使用下面的代码:
from xlrd import open_workbook
wb = open_workbook("Sample.xls")
headers = []
sdata = []
for s in wb.sheets():
print "Sheet:",s.name
if s.name.capitalize() == "Data":
for row in range(s.nrows):
values = []
for col in range(s.ncols):
data = s.cell(row,col).value
if row == 0:
headers.append(data)
else:
values.append(data)
sdata.append(values)
As is probably obvious, headers
is a simple list storing the column headers and sdata
contains the table data, stored as a list of lists. Here is what they look:
很明显,header是一个存储列标题的简单列表,sdata包含表数据,存储为列表列表。这是他们看起来的样子:
headers:
标题:
[u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland']
sdata:
SDATA:
[[u'Blackbird', 47.0, 10.0, 40.0, 2.0, 2.0], [u'Chaffinch', 19.0, 3.0, 5.0, 0.0, 2.0], [u'Great Tit', 50.0, 0.0, 10.0, 7.0, 0.0], [u'House Sparrow', 46.0, 16.0, 8.0, 4.0, 0.0], [u'Robin', 9.0, 3.0, 0.0, 0.0, 2.0], [u'Song Thrush', 4.0, 0.0, 6.0, 0.0, 0.0]]
But I want to store these data into a Python dictionary, with each column as the key for a list containing all values for each column. For example (only part of the data is shown to save space):
但我希望将这些数据存储到Python字典中,每列作为包含每列所有值的列表的键。例如(仅显示部分数据以节省空间):
dict = {
'Species': ['Blackbird','Chaffinch','Great Tit'],
'Garden': [47,19,50],
'Hedgerow': [10,3,0],
'Parkland': [40,5,10],
'Pasture': [2,0,7],
'Woodland': [2,2,0]
}
So, my question is: how can I achieve this? I know I could read the data by columns instead of by rows as in the code snippet above, but I could not figure out how to store the columns in a dictionary.
所以,我的问题是:我怎样才能做到这一点?我知道我可以按列而不是按行读取数据,如上面的代码片段,但我无法弄清楚如何将列存储在字典中。
Thanks in advance for any assistance you can provide.
提前感谢您提供的任何帮助。
5 个解决方案
#1
2
Once you have the columns, it's fairly easy:
一旦你有了列,这很容易:
dict(zip(headers, sdata))
Actually, it looks like sdata
in your example may be the row data, even so, that's still fairly easy, you can transpose the table with zip
as well:
实际上,看起来你的例子中的sdata可能是行数据,即便如此,这仍然相当容易,你也可以用zip转置表格:
dict(zip(headers, zip(*sdata)))
One of these two is what you are asking for.
这两个中的一个就是你要求的。
#2
3
1 . XLRD
1。 XLRD
I would highly recommend using defaultdict from collections library. The value of each key will be initiated with the default value, an empty list in this case. I did not put that much exception catch there, you might want to add in exception detection based on your use case.
我强烈建议使用集合库中的defaultdict。每个键的值将使用默认值启动,在这种情况下为空列表。我没有把那么多异常捕获,你可能想根据你的用例添加异常检测。
import xlrd
import sys
from collections import defaultdict
result = defaultdict(list)
workbook = xlrd.open_workbook("/Users/datafireball/Desktop/*.xlsx")
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0])
headers = worksheet.row(0)
for index in range(worksheet.nrows)[1:]:
try:
for header, col in zip(headers, worksheet.row(index)):
result[header.value].append(col.value)
except:
print sys.exc_info()
print result
Output:
输出:
defaultdict(<type 'list'>,
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0],
u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0],
u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0],
u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0],
u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0],
u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']})
2 . Pandas
2。熊猫
import pandas as pd
xl = pd.ExcelFile("/Users/datafireball/Desktop/*.xlsx")
df = xl.parse(xl.sheet_names[0])
print df
Output, and you cannot imagine how much flexibility you can gain using dataframe.
输出,您无法想象使用数据帧可以获得多大的灵活性。
Species Garden Hedgerow Parkland Pasture Woodland
0 Blackbird 47 10 40 2 2
1 Chaffinch 19 3 5 0 2
2 Great Tit 50 0 10 7 0
3 House Sparrow 46 16 8 4 0
4 Robin 9 3 0 0 2
5 Song Thrush 4 0 6 0 0
#3
2
I will contribute myself, providing yet another answer for my own question!
我会为自己做出贡献,为我自己的问题提供另一个答案!
Just after I posted my question, I found out pyexcel -- a pretty little Python library which acts as a wrapper for other spreadsheet-handling packages (namely, xlrd and odfpy). It has a nice to_dict method which does exactly what I want (even without the need to transpose the table)!
在我发布问题之后,我发现了pyexcel - 一个非常小的Python库,它充当其他电子表格处理包(即xlrd和odfpy)的包装器。它有一个很好的to_dict方法,它完全符合我的要求(即使不需要转置表格)!
Here is an exemple, using the data above:
这是一个例子,使用上面的数据:
from pyexcel import SeriesReader
from pyexcel.utils import to_dict
sheet = SeriesReader("Sample.xls")
print sheet.series() #--- just the headers, stored in a list
data = to_dict(sheet)
print data #--- the full dataset, stored in a dictionary
Output:
输出:
u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland']
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']}
Hope it also helps!
希望它也有帮助!
#4
1
If XLRD doesn't solve your problem, consider looking at XLWings. One of the example videos shows how to take data from an Excel table and import it into a Pandas dataframe, which would be more usable than a dictionary.
如果XLRD无法解决您的问题,请考虑查看XLWings。其中一个示例视频演示了如何从Excel表中获取数据并将其导入Pandas数据帧,这比字典更有用。
If you really want a dictionary, Pandas can convert to that easily, see here.
如果你真的想要一本字典,熊猫可以轻松转换到那里,请看这里。
#5
1
This script allow you to transform a excel data to list of dictionnary
此脚本允许您将Excel数据转换为字典列表
import xlrd
workbook = xlrd.open_workbook('Sample.xls')
workbook = xlrd.open_workbook('Sample.xls', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock names of columns
for col in range(worksheet.ncols):
first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
elm = {}
for col in range(worksheet.ncols):
elm[first_row[col]]=worksheet.cell_value(row,col)
data.append(elm)
print data
#1
2
Once you have the columns, it's fairly easy:
一旦你有了列,这很容易:
dict(zip(headers, sdata))
Actually, it looks like sdata
in your example may be the row data, even so, that's still fairly easy, you can transpose the table with zip
as well:
实际上,看起来你的例子中的sdata可能是行数据,即便如此,这仍然相当容易,你也可以用zip转置表格:
dict(zip(headers, zip(*sdata)))
One of these two is what you are asking for.
这两个中的一个就是你要求的。
#2
3
1 . XLRD
1。 XLRD
I would highly recommend using defaultdict from collections library. The value of each key will be initiated with the default value, an empty list in this case. I did not put that much exception catch there, you might want to add in exception detection based on your use case.
我强烈建议使用集合库中的defaultdict。每个键的值将使用默认值启动,在这种情况下为空列表。我没有把那么多异常捕获,你可能想根据你的用例添加异常检测。
import xlrd
import sys
from collections import defaultdict
result = defaultdict(list)
workbook = xlrd.open_workbook("/Users/datafireball/Desktop/*.xlsx")
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0])
headers = worksheet.row(0)
for index in range(worksheet.nrows)[1:]:
try:
for header, col in zip(headers, worksheet.row(index)):
result[header.value].append(col.value)
except:
print sys.exc_info()
print result
Output:
输出:
defaultdict(<type 'list'>,
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0],
u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0],
u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0],
u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0],
u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0],
u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']})
2 . Pandas
2。熊猫
import pandas as pd
xl = pd.ExcelFile("/Users/datafireball/Desktop/*.xlsx")
df = xl.parse(xl.sheet_names[0])
print df
Output, and you cannot imagine how much flexibility you can gain using dataframe.
输出,您无法想象使用数据帧可以获得多大的灵活性。
Species Garden Hedgerow Parkland Pasture Woodland
0 Blackbird 47 10 40 2 2
1 Chaffinch 19 3 5 0 2
2 Great Tit 50 0 10 7 0
3 House Sparrow 46 16 8 4 0
4 Robin 9 3 0 0 2
5 Song Thrush 4 0 6 0 0
#3
2
I will contribute myself, providing yet another answer for my own question!
我会为自己做出贡献,为我自己的问题提供另一个答案!
Just after I posted my question, I found out pyexcel -- a pretty little Python library which acts as a wrapper for other spreadsheet-handling packages (namely, xlrd and odfpy). It has a nice to_dict method which does exactly what I want (even without the need to transpose the table)!
在我发布问题之后,我发现了pyexcel - 一个非常小的Python库,它充当其他电子表格处理包(即xlrd和odfpy)的包装器。它有一个很好的to_dict方法,它完全符合我的要求(即使不需要转置表格)!
Here is an exemple, using the data above:
这是一个例子,使用上面的数据:
from pyexcel import SeriesReader
from pyexcel.utils import to_dict
sheet = SeriesReader("Sample.xls")
print sheet.series() #--- just the headers, stored in a list
data = to_dict(sheet)
print data #--- the full dataset, stored in a dictionary
Output:
输出:
u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland']
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']}
Hope it also helps!
希望它也有帮助!
#4
1
If XLRD doesn't solve your problem, consider looking at XLWings. One of the example videos shows how to take data from an Excel table and import it into a Pandas dataframe, which would be more usable than a dictionary.
如果XLRD无法解决您的问题,请考虑查看XLWings。其中一个示例视频演示了如何从Excel表中获取数据并将其导入Pandas数据帧,这比字典更有用。
If you really want a dictionary, Pandas can convert to that easily, see here.
如果你真的想要一本字典,熊猫可以轻松转换到那里,请看这里。
#5
1
This script allow you to transform a excel data to list of dictionnary
此脚本允许您将Excel数据转换为字典列表
import xlrd
workbook = xlrd.open_workbook('Sample.xls')
workbook = xlrd.open_workbook('Sample.xls', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock names of columns
for col in range(worksheet.ncols):
first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
elm = {}
for col in range(worksheet.ncols):
elm[first_row[col]]=worksheet.cell_value(row,col)
data.append(elm)
print data