Here is my code,
这是我的代码,
for line in open('u.item'):
#read each line
whenever I run this code it gives the following error:
当我运行这个代码时,它会给出如下错误:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte
I tried to solve this and add an extra parameter in open(), the code looks like;
我试图解决这个问题,并在open()中添加一个额外的参数,代码看起来是这样的;
for line in open('u.item', encoding='utf-8'):
#read each line
But again it gives the same error. what should I do then! Please help.
但是同样的错误。那我该怎么办呢?请帮助。
7 个解决方案
#1
167
As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8")
with open('u.item', encoding = "ISO-8859-1")
will solve the problem.
正如马克·兰森所建议的,我为这个问题找到了正确的编码。编码为“ISO-8859-1”,因此替换为open(“u”)。项”,编码= " utf - 8 ")与开放(u。项目',编码= "ISO-8859-1")将解决问题。
#2
18
Your file doesn't actually contain utf-8 encoded data, it contains some other encoding. Figure out what that encoding is and use it in the open
call.
您的文件实际上并没有包含utf-8编码的数据,它包含一些其他的编码。找出编码是什么,并在公开调用中使用它。
In Windows-1252 encoding for example the 0xe9
would be the character é
.
在Windows-1252编码中,0xe9将是字符e。
#3
9
Try this to read using pandas
试着用熊猫来阅读。
pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')
#4
6
If you are using Python 2
the following will the solution:
如果您使用的是Python 2,那么下面的解决方案是:
import io
for line in io.open("u.item", encoding="ISO-8859-1"):
# do something
Because encoding
parameter doesn't work with open()
, you will be getting the following error:
因为编码参数不能与open()一起工作,所以您将得到以下错误:
TypeError: 'encoding' is an invalid keyword argument for this function
#5
3
Also worked for me, ISO 8859-1 is going to save a lot, hahaha, mainly if using Speech Recognition API's
也为我工作,ISO 8859-1将会节省很多,哈哈哈,主要是使用语音识别API。
Example:
例子:
file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");
#6
2
If someone looking for these, this is an example for converting a CSV file in Python 3:
如果有人正在寻找这些,这是在Python 3中转换CSV文件的示例:
try:
inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
pass
#7
1
Simplest of all Solutions:
简单的解决方案:
Use Pandas to read file, its very simple:
用熊猫来读文件,很简单:
import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')
#1
167
As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8")
with open('u.item', encoding = "ISO-8859-1")
will solve the problem.
正如马克·兰森所建议的,我为这个问题找到了正确的编码。编码为“ISO-8859-1”,因此替换为open(“u”)。项”,编码= " utf - 8 ")与开放(u。项目',编码= "ISO-8859-1")将解决问题。
#2
18
Your file doesn't actually contain utf-8 encoded data, it contains some other encoding. Figure out what that encoding is and use it in the open
call.
您的文件实际上并没有包含utf-8编码的数据,它包含一些其他的编码。找出编码是什么,并在公开调用中使用它。
In Windows-1252 encoding for example the 0xe9
would be the character é
.
在Windows-1252编码中,0xe9将是字符e。
#3
9
Try this to read using pandas
试着用熊猫来阅读。
pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')
#4
6
If you are using Python 2
the following will the solution:
如果您使用的是Python 2,那么下面的解决方案是:
import io
for line in io.open("u.item", encoding="ISO-8859-1"):
# do something
Because encoding
parameter doesn't work with open()
, you will be getting the following error:
因为编码参数不能与open()一起工作,所以您将得到以下错误:
TypeError: 'encoding' is an invalid keyword argument for this function
#5
3
Also worked for me, ISO 8859-1 is going to save a lot, hahaha, mainly if using Speech Recognition API's
也为我工作,ISO 8859-1将会节省很多,哈哈哈,主要是使用语音识别API。
Example:
例子:
file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");
#6
2
If someone looking for these, this is an example for converting a CSV file in Python 3:
如果有人正在寻找这些,这是在Python 3中转换CSV文件的示例:
try:
inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
pass
#7
1
Simplest of all Solutions:
简单的解决方案:
Use Pandas to read file, its very simple:
用熊猫来读文件,很简单:
import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')