UnicodeDecodeError: 'utf-8' codec不能解码字节。

时间:2021-08-26 10:27:58

Here is my code,

这是我的代码,

for line in open('u.item'):
#read each line

whenever I run this code it gives the following error:

当我运行这个代码时,它会给出如下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

I tried to solve this and add an extra parameter in open(), the code looks like;

我试图解决这个问题,并在open()中添加一个额外的参数,代码看起来是这样的;

for line in open('u.item', encoding='utf-8'):
#read each line

But again it gives the same error. what should I do then! Please help.

但是同样的错误。那我该怎么办呢?请帮助。

7 个解决方案

#1


167  

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8") with open('u.item', encoding = "ISO-8859-1") will solve the problem.

正如马克·兰森所建议的,我为这个问题找到了正确的编码。编码为“ISO-8859-1”,因此替换为open(“u”)。项”,编码= " utf - 8 ")与开放(u。项目',编码= "ISO-8859-1")将解决问题。

#2


18  

Your file doesn't actually contain utf-8 encoded data, it contains some other encoding. Figure out what that encoding is and use it in the open call.

您的文件实际上并没有包含utf-8编码的数据,它包含一些其他的编码。找出编码是什么,并在公开调用中使用它。

In Windows-1252 encoding for example the 0xe9 would be the character é.

在Windows-1252编码中,0xe9将是字符e。

#3


9  

Try this to read using pandas

试着用熊猫来阅读。

pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')

#4


6  

If you are using Python 2 the following will the solution:

如果您使用的是Python 2,那么下面的解决方案是:

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # do something

Because encoding parameter doesn't work with open(), you will be getting the following error:

因为编码参数不能与open()一起工作,所以您将得到以下错误:

TypeError: 'encoding' is an invalid keyword argument for this function

#5


3  

Also worked for me, ISO 8859-1 is going to save a lot, hahaha, mainly if using Speech Recognition API's

也为我工作,ISO 8859-1将会节省很多,哈哈哈,主要是使用语音识别API。

Example:

例子:

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

#6


2  

If someone looking for these, this is an example for converting a CSV file in Python 3:

如果有人正在寻找这些,这是在Python 3中转换CSV文件的示例:

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

#7


1  

Simplest of all Solutions:

简单的解决方案:

Use Pandas to read file, its very simple:

用熊猫来读文件,很简单:

import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')

#1


167  

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8") with open('u.item', encoding = "ISO-8859-1") will solve the problem.

正如马克·兰森所建议的,我为这个问题找到了正确的编码。编码为“ISO-8859-1”,因此替换为open(“u”)。项”,编码= " utf - 8 ")与开放(u。项目',编码= "ISO-8859-1")将解决问题。

#2


18  

Your file doesn't actually contain utf-8 encoded data, it contains some other encoding. Figure out what that encoding is and use it in the open call.

您的文件实际上并没有包含utf-8编码的数据,它包含一些其他的编码。找出编码是什么,并在公开调用中使用它。

In Windows-1252 encoding for example the 0xe9 would be the character é.

在Windows-1252编码中,0xe9将是字符e。

#3


9  

Try this to read using pandas

试着用熊猫来阅读。

pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')

#4


6  

If you are using Python 2 the following will the solution:

如果您使用的是Python 2,那么下面的解决方案是:

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # do something

Because encoding parameter doesn't work with open(), you will be getting the following error:

因为编码参数不能与open()一起工作,所以您将得到以下错误:

TypeError: 'encoding' is an invalid keyword argument for this function

#5


3  

Also worked for me, ISO 8859-1 is going to save a lot, hahaha, mainly if using Speech Recognition API's

也为我工作,ISO 8859-1将会节省很多,哈哈哈,主要是使用语音识别API。

Example:

例子:

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

#6


2  

If someone looking for these, this is an example for converting a CSV file in Python 3:

如果有人正在寻找这些,这是在Python 3中转换CSV文件的示例:

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

#7


1  

Simplest of all Solutions:

简单的解决方案:

Use Pandas to read file, its very simple:

用熊猫来读文件,很简单:

import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')