python unicode处理打印和system .stdout.write之间的差异

时间:2022-06-11 00:05:51

I'll start by saying that I've already seen this post: Strange python print behavior with unicode, but the solution offered there (using PYTHONIOENCODING) didn't work for me.

我首先要说的是,我已经看过这篇文章:使用unicode的奇怪的python打印行为,但是这里提供的解决方案(使用python编码)对我不起作用。

Here's my issue:

这是我的问题:

Python 2.6.5 (r265:79063, Apr  9 2010, 11:16:46)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
>>> a = u'\xa6'
>>> print a 
¦

works just fine, however:

然而,工作得很好:

>>> sys.stdout.write(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 0: ordinal not in range(128)

throws an error. The post I linked to at the top suggests that this is because the default console encoding is 'ascii'. However, in my case it's not:

抛出一个错误。我在上面链接到的帖子表明这是因为默认的控制台编码是'ascii'。但是,在我的例子中不是:

>>> sys.stdout.encoding
'UTF-8'

So any thoughts on what's at work here and how to fix this issue?

有什么想法吗?怎么解决这个问题?

Thanks D.

由于D。

1 个解决方案

#1


12  

This is due to a long-standing bug that was fixed in python-2.7, but too late to be back-ported to python-2.6.

这是由于一个长期存在的bug在python-2.7中被修复了,但是现在回到python-2.6中已经太晚了。

The documentation states that when unicode strings are written to a file, they should be converted to byte strings using file.encoding. But this was not being honoured by sys.stdout, which instead was using the default unicode encoding. This is usually set to "ascii" by the site module, but it can be changed with sys.setdefaultencoding:

该文档指出,当unicode字符串被写到文件中时,应该使用file.encoding来将它们转换为字节字符串。但这并没有得到系统的认可。stdout,它使用默认的unicode编码。站点模块通常将其设置为“ascii”,但是可以使用sys.setdefaultencoding:

Python 2.6.7 (r267:88850, Aug 14 2011, 12:32:40) [GCC 4.6.2] on linux3
>>> a = u'\xa6\n'
>>> sys.stdout.write(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode character u'\xa6' ...
>>> reload(sys).setdefaultencoding('utf8')
>>> sys.stdout.write(a)
¦

However, a better solution might be to replace sys.stdout with a wrapper:

然而,更好的解决方案可能是替换sys。stdout包装:

class StdOut(object):
    def write(self, string):
        if isinstance(string, unicode):
            string = string.encode(sys.__stdout__.encoding)
        sys.__stdout__.write(string)

>>> sys.stdout = StdOut()
>>> sys.stdout.write(a)
¦

#1


12  

This is due to a long-standing bug that was fixed in python-2.7, but too late to be back-ported to python-2.6.

这是由于一个长期存在的bug在python-2.7中被修复了,但是现在回到python-2.6中已经太晚了。

The documentation states that when unicode strings are written to a file, they should be converted to byte strings using file.encoding. But this was not being honoured by sys.stdout, which instead was using the default unicode encoding. This is usually set to "ascii" by the site module, but it can be changed with sys.setdefaultencoding:

该文档指出,当unicode字符串被写到文件中时,应该使用file.encoding来将它们转换为字节字符串。但这并没有得到系统的认可。stdout,它使用默认的unicode编码。站点模块通常将其设置为“ascii”,但是可以使用sys.setdefaultencoding:

Python 2.6.7 (r267:88850, Aug 14 2011, 12:32:40) [GCC 4.6.2] on linux3
>>> a = u'\xa6\n'
>>> sys.stdout.write(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode character u'\xa6' ...
>>> reload(sys).setdefaultencoding('utf8')
>>> sys.stdout.write(a)
¦

However, a better solution might be to replace sys.stdout with a wrapper:

然而,更好的解决方案可能是替换sys。stdout包装:

class StdOut(object):
    def write(self, string):
        if isinstance(string, unicode):
            string = string.encode(sys.__stdout__.encoding)
        sys.__stdout__.write(string)

>>> sys.stdout = StdOut()
>>> sys.stdout.write(a)
¦