UnicodeEncodeError:“ascii”编解码器不能将字符u'\u2013'在位置3 2:序号不在范围内(128)

时间:2022-03-21 20:21:48

I am parsing an xsl file using xlrd. Most of the things are working fine. I have a dictionary where keys are strings and values are lists of strings. All the keys and values are unicode. I can print most of the keys and values using str() method. But some values have the unicode character - \u2013 for which I get the above error.

我正在使用xlrd解析xsl文件。大多数事情都很顺利。我有一个字典,它的键是字符串,值是字符串列表。所有的键和值都是unicode。我可以使用str()方法打印大部分的键和值。但是有些值具有unicode字符- \u2013,我得到了上面的错误。

I suspect that this is happening because this is unicode embedded in unicode and python interpreter cannot decode it. So how can I get rid of this error?

我怀疑这种情况正在发生,因为这是unicode嵌入的unicode,而python解释器无法解码它。怎样才能去掉这个错误呢?

Thanks in advance.

提前谢谢。

5 个解决方案

#1


58  

You can print Unicode objects as well, you don't need to do str() around it.

您也可以打印Unicode对象,不需要在它周围做str()。

Assuming you really want a str:

假设你真的想要一个str:

When you do str(u'\u2013') you are trying to convert the Unicode string to a 8-bit string. To do this you need to use an encoding, a mapping between Unicode data to 8-bit data. What str() does is that is uses the system default encoding, which under Python 2 is ASCII. ASCII contains only the 127 first code points of Unicode, that is \u0000 to \u007F1. The result is that you get the above error, the ASCII codec just doesn't know what \u2013 is (it's a long dash, btw).

当您执行str(u'\u2013')时,您正在尝试将Unicode字符串转换为8位字符串。要做到这一点,您需要使用编码,将Unicode数据映射到8位数据。str()做的是使用系统默认编码,在Python 2下是ASCII。ASCII仅包含Unicode的127个第一个代码点,即\u0000到\u007F1。结果是,你得到了上面的错误,ASCII码编解码不知道u2013是什么(这是一个很长的dash, btw)。

You therefore need to specify which encoding you want to use. Common ones are ISO-8859-1, most commonly known as Latin-1, which contains the 256 first code points; UTF-8, which can encode all code-points by using variable length encoding, CP1252 that is common on Windows, and various Chinese and Japanese encodings.

因此,您需要指定要使用的编码。常见的是ISO-8859-1,最常见的称为Latin-1,它包含256个第一个代码点;UTF-8可以通过使用可变长度编码来编码所有的代码点,这在Windows上是常见的,并且有各种各样的中文和日语编码。

You use them like this:

你可以这样使用:

u'\u2013'.encode('utf8')

The result is a str containing a sequence of bytes that is the uTF8 representation of the character in question:

结果是一个包含一个字节序列的str,它是该字符的uTF8表示形式:

'\xe2\x80\x93'

And you can print it:

你可以打印出来

>>> print '\xe2\x80\x93'
–

#2


22  

You can also try this to get the text.

你也可以试试这个来得到文本。

foo.encode('ascii', 'ignore')

#3


6  

As here str(u'\u2013') is causing error so use isinstance(foo,basestring) to check for unicode/string, if not of type base string convert it into Unicode and then apply encode

在这里,str(u'\u2013')会导致错误,所以使用isinstance(foo,basestring)来检查unicode/string,如果不是类型基字符串将其转换为unicode,然后应用编码。

if isinstance(foo,basestring):
    foo.encode('utf8')
else:
    unicode(foo).encode('utf8')

further read

进一步的阅读

#4


4  

I had the same problem. This work fine for me:

我遇到了同样的问题。这项工作对我来说很好:

str(objdata).encode('utf-8')

#5


0  

for me this works

对我来说这是

unicode(data).encode('utf-8')

unicode(数据).encode(“utf - 8”)

#1


58  

You can print Unicode objects as well, you don't need to do str() around it.

您也可以打印Unicode对象,不需要在它周围做str()。

Assuming you really want a str:

假设你真的想要一个str:

When you do str(u'\u2013') you are trying to convert the Unicode string to a 8-bit string. To do this you need to use an encoding, a mapping between Unicode data to 8-bit data. What str() does is that is uses the system default encoding, which under Python 2 is ASCII. ASCII contains only the 127 first code points of Unicode, that is \u0000 to \u007F1. The result is that you get the above error, the ASCII codec just doesn't know what \u2013 is (it's a long dash, btw).

当您执行str(u'\u2013')时,您正在尝试将Unicode字符串转换为8位字符串。要做到这一点,您需要使用编码,将Unicode数据映射到8位数据。str()做的是使用系统默认编码,在Python 2下是ASCII。ASCII仅包含Unicode的127个第一个代码点,即\u0000到\u007F1。结果是,你得到了上面的错误,ASCII码编解码不知道u2013是什么(这是一个很长的dash, btw)。

You therefore need to specify which encoding you want to use. Common ones are ISO-8859-1, most commonly known as Latin-1, which contains the 256 first code points; UTF-8, which can encode all code-points by using variable length encoding, CP1252 that is common on Windows, and various Chinese and Japanese encodings.

因此,您需要指定要使用的编码。常见的是ISO-8859-1,最常见的称为Latin-1,它包含256个第一个代码点;UTF-8可以通过使用可变长度编码来编码所有的代码点,这在Windows上是常见的,并且有各种各样的中文和日语编码。

You use them like this:

你可以这样使用:

u'\u2013'.encode('utf8')

The result is a str containing a sequence of bytes that is the uTF8 representation of the character in question:

结果是一个包含一个字节序列的str,它是该字符的uTF8表示形式:

'\xe2\x80\x93'

And you can print it:

你可以打印出来

>>> print '\xe2\x80\x93'
–

#2


22  

You can also try this to get the text.

你也可以试试这个来得到文本。

foo.encode('ascii', 'ignore')

#3


6  

As here str(u'\u2013') is causing error so use isinstance(foo,basestring) to check for unicode/string, if not of type base string convert it into Unicode and then apply encode

在这里,str(u'\u2013')会导致错误,所以使用isinstance(foo,basestring)来检查unicode/string,如果不是类型基字符串将其转换为unicode,然后应用编码。

if isinstance(foo,basestring):
    foo.encode('utf8')
else:
    unicode(foo).encode('utf8')

further read

进一步的阅读

#4


4  

I had the same problem. This work fine for me:

我遇到了同样的问题。这项工作对我来说很好:

str(objdata).encode('utf-8')

#5


0  

for me this works

对我来说这是

unicode(data).encode('utf-8')

unicode(数据).encode(“utf - 8”)