I'm not entirely sure what I need to do about this error. I assumed that it had to do with needing to add .encode('utf-8'). But I'm not entirely sure if that's what I need to do, nor where I should apply this.
我不完全确定我需要对此错误做些什么。我认为它与需要添加.encode('utf-8')有关。但我不完全确定这是我需要做的,也不应该在哪里应用。
The error is:
错误是:
line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
7: ordinal not in range(128)
This is the base of my python script.
这是我的python脚本的基础。
import csv
from BeautifulSoup import BeautifulSoup
url = \
'https://dummysite'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', {'class': 'table'})
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./test.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Location"])
writer.writerows(list_of_rows)
2 个解决方案
#1
18
Python 2.x CSV library is broken. You have three options. In order of complexity:
Python 2.x CSV库已损坏。你有三个选择。按复杂程度排列:
-
Edit: See below
Use the fixed library https://github.com/jdunck/python-unicodecsv (pip install unicodecsv
). Use as a drop-in replacement - Example:编辑:见下文使用固定库https://github.com/jdunck/python-unicodecsv(pip install unicodecsv)。用作替代品 - 示例:
with open("myfile.csv", 'rb') as my_file: r = unicodecsv.DictReader(my_file, encoding='utf-8')
-
Read the CSV manual regarding Unicode: https://docs.python.org/2/library/csv.html (See examples at the bottom)
阅读有关Unicode的CSV手册:https://docs.python.org/2/library/csv.html(参见底部的示例)
-
Manually encode each item as UTF-8:
手动将每个项目编码为UTF-8:
for cell in row.findAll('td'): text = cell.text.replace('[','').replace(']','') list_of_cells.append(text.encode("utf-8"))
Edit, I found python-unicodecsv is also broken when reading UTF-16. It complains about any 0x00
bytes.
编辑,我发现读取UTF-16时python-unicodecsv也被破坏了。它抱怨任何0x00字节。
Instead, use https://github.com/ryanhiebert/backports.csv, which more closely resembles Python 3 implementation and uses io
module..
相反,使用https://github.com/ryanhiebert/backports.csv,它更类似于Python 3实现并使用io模块。
Install:
安装:
pip install backports.csv
Usage:
用法:
from backports import csv
import io
with io.open(filename, encoding='utf-8') as f:
r = csv.reader(f):
#2
0
I found the easiest option, in addition to Alastair's excellent suggestions, to be using python3 instead of python 2. all it required in my script was to change wb
in the open
statement to simply w
in accordance with Python3's syntax.
除了Alastair的优秀建议之外,我发现最简单的选择是使用python3而不是python 2.我的脚本中所需要的只是将open语句中的wb改为w,符合Python3的语法。
#1
18
Python 2.x CSV library is broken. You have three options. In order of complexity:
Python 2.x CSV库已损坏。你有三个选择。按复杂程度排列:
-
Edit: See below
Use the fixed library https://github.com/jdunck/python-unicodecsv (pip install unicodecsv
). Use as a drop-in replacement - Example:编辑:见下文使用固定库https://github.com/jdunck/python-unicodecsv(pip install unicodecsv)。用作替代品 - 示例:
with open("myfile.csv", 'rb') as my_file: r = unicodecsv.DictReader(my_file, encoding='utf-8')
-
Read the CSV manual regarding Unicode: https://docs.python.org/2/library/csv.html (See examples at the bottom)
阅读有关Unicode的CSV手册:https://docs.python.org/2/library/csv.html(参见底部的示例)
-
Manually encode each item as UTF-8:
手动将每个项目编码为UTF-8:
for cell in row.findAll('td'): text = cell.text.replace('[','').replace(']','') list_of_cells.append(text.encode("utf-8"))
Edit, I found python-unicodecsv is also broken when reading UTF-16. It complains about any 0x00
bytes.
编辑,我发现读取UTF-16时python-unicodecsv也被破坏了。它抱怨任何0x00字节。
Instead, use https://github.com/ryanhiebert/backports.csv, which more closely resembles Python 3 implementation and uses io
module..
相反,使用https://github.com/ryanhiebert/backports.csv,它更类似于Python 3实现并使用io模块。
Install:
安装:
pip install backports.csv
Usage:
用法:
from backports import csv
import io
with io.open(filename, encoding='utf-8') as f:
r = csv.reader(f):
#2
0
I found the easiest option, in addition to Alastair's excellent suggestions, to be using python3 instead of python 2. all it required in my script was to change wb
in the open
statement to simply w
in accordance with Python3's syntax.
除了Alastair的优秀建议之外,我发现最简单的选择是使用python3而不是python 2.我的脚本中所需要的只是将open语句中的wb改为w,符合Python3的语法。