im able to fully scrap the material i needed the problem is i cant get the data into excel.
我能够完全废弃我需要的材料问题是我无法将数据转化为excel。
from lxml import html
import requests
import xlsxwriter
page = requests.get('website that gets mined')
tree = html.fromstring(page.content)
items = tree.xpath('//h4[@class="item-title"]/text()')
prices = tree.xpath('//span[@class="price"]/text()')
description = tree.xpath('//div[@class="description text"]/text()')
print 'items: ', items
print 'Prices: ', prices
print 'description', description
everything works fine until this section where i try to get the data into excel this is the error message:
一切正常,直到本节我试图将数据导入excel这是错误信息:
for items,prices,description in (array):
ValueError: too many values to unpack
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x104735e10>> ignored
this is what it was trying to do
这就是它试图做的事情
array = [items,prices,description]
workbook = xlsxwriter.Workbook('test1.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
for items,prices,description in (array):
worksheet.write(row, col, items)
worksheet.write(row, col + 1, prices)
worksheet.write(row, col + 2, description)
row += 1
workbook.close()
2 个解决方案
#1
1
Assuming that "items,prices,description" all have the same length, you could rewrite the final part of the code in :
假设“项目,价格,描述”都具有相同的长度,您可以重写代码的最后部分:
for item,price,desc in zip(items,prices,description)
worksheet.write(row, col, item)
worksheet.write(row, col + 1, price)
worksheet.write(row, col + 2, desc)
row += 1
If the lists can have unequal lengths you should check this for alternatives for the zip
method, but I would be worried for the data consistency.
如果列表的长度不等,你应该检查这个替代zip方法,但我会担心数据的一致性。
#2
0
Inevitably, it will be easier to write to a CSV file, or a Text file, rather than an Excel file.
不可避免地,写入CSV文件或文本文件而不是Excel文件会更容易。
import urllib2
listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]
urls = []
for company in listOfStocks:
urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')
Output_File = open('C:/your_path_here/Data.csv','w')
New_Format_Data = ''
for counter in range(0, len(urls)):
Original_Data = urllib2.urlopen(urls[counter]).read()
if counter == 0:
New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()
rows = Original_Data.splitlines(1)
for row in range(1, len(rows)):
New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]
Output_File.write(New_Format_Data)
Output_File.close()
OR
要么
from bs4 import BeautifulSoup
import urllib2
var_file = urllib2.urlopen("http://www.imdb.com/chart/top")
var_html = var_file.read()
text_file = open("C:/your_path_here/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
#print(link)
z = str(link)
text_file.write(z + "\r\n")
text_file.close()
As a developer, it's difficult to programmatically manipulate Excel files since the Excel is proprietary. This is especially true for languages other than .NET. On the other hand, for a developer it's easy to programmatically manipulate CSV since, after all, they are simple text files.
作为开发人员,由于Excel是专有的,因此很难以编程方式操作Excel文件。对于.NET以外的语言尤其如此。另一方面,对于开发人员来说,很容易以编程方式操作CSV,因为毕竟它们是简单的文本文件。
#1
1
Assuming that "items,prices,description" all have the same length, you could rewrite the final part of the code in :
假设“项目,价格,描述”都具有相同的长度,您可以重写代码的最后部分:
for item,price,desc in zip(items,prices,description)
worksheet.write(row, col, item)
worksheet.write(row, col + 1, price)
worksheet.write(row, col + 2, desc)
row += 1
If the lists can have unequal lengths you should check this for alternatives for the zip
method, but I would be worried for the data consistency.
如果列表的长度不等,你应该检查这个替代zip方法,但我会担心数据的一致性。
#2
0
Inevitably, it will be easier to write to a CSV file, or a Text file, rather than an Excel file.
不可避免地,写入CSV文件或文本文件而不是Excel文件会更容易。
import urllib2
listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]
urls = []
for company in listOfStocks:
urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')
Output_File = open('C:/your_path_here/Data.csv','w')
New_Format_Data = ''
for counter in range(0, len(urls)):
Original_Data = urllib2.urlopen(urls[counter]).read()
if counter == 0:
New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()
rows = Original_Data.splitlines(1)
for row in range(1, len(rows)):
New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]
Output_File.write(New_Format_Data)
Output_File.close()
OR
要么
from bs4 import BeautifulSoup
import urllib2
var_file = urllib2.urlopen("http://www.imdb.com/chart/top")
var_html = var_file.read()
text_file = open("C:/your_path_here/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
#print(link)
z = str(link)
text_file.write(z + "\r\n")
text_file.close()
As a developer, it's difficult to programmatically manipulate Excel files since the Excel is proprietary. This is especially true for languages other than .NET. On the other hand, for a developer it's easy to programmatically manipulate CSV since, after all, they are simple text files.
作为开发人员,由于Excel是专有的,因此很难以编程方式操作Excel文件。对于.NET以外的语言尤其如此。另一方面,对于开发人员来说,很容易以编程方式操作CSV,因为毕竟它们是简单的文本文件。