UnicodeEncodeError:“ascii”编码解码器无法在17710的位置编码字符u'\xe7:序数不在范围(128)

时间:2020-12-08 20:21:31

I'm trying to print a string from an archived web crawl, but when I do I get this error:

我正在尝试从一个存档的web抓取中打印一个字符串,但是当我这样做时,我就会得到这个错误:

print page['html']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 17710: ordinal not in range(128)

When I try print unicode(page['html']) I get:

当我尝试打印unicode(页面['html'])时,我得到:

print unicode(page['html'],errors='ignore')
TypeError: decoding Unicode is not supported

Any idea how I can properly code this string, or at least get it to print? Thanks.

知道如何正确地编码这个字符串,或者至少让它打印出来吗?谢谢。

1 个解决方案

#1


20  

You need to encode the unicode you saved to display it, not decode it -- unicode is the unencoded form. You should always specify an encoding, so that your code will be portable. The "usual" pick is utf-8:

您需要对所保存的unicode编码进行编码,而不是解码它——unicode是未编码的表单。您应该始终指定一个编码,这样您的代码就可以移植了。通常的选择是utf-8:

print page['html'].encode('utf-8')

If you don't specify an encoding, whether or not it works will depend on what you're printing to -- your editor, OS, terminal program, etc.

如果您不指定编码,它是否工作将取决于您要打印什么——您的编辑器、操作系统、终端程序等等。

#1


20  

You need to encode the unicode you saved to display it, not decode it -- unicode is the unencoded form. You should always specify an encoding, so that your code will be portable. The "usual" pick is utf-8:

您需要对所保存的unicode编码进行编码,而不是解码它——unicode是未编码的表单。您应该始终指定一个编码,这样您的代码就可以移植了。通常的选择是utf-8:

print page['html'].encode('utf-8')

If you don't specify an encoding, whether or not it works will depend on what you're printing to -- your editor, OS, terminal program, etc.

如果您不指定编码,它是否工作将取决于您要打印什么——您的编辑器、操作系统、终端程序等等。