如何使python 3打印()utf8 ?

时间:2022-02-11 00:07:06

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?

如何将python 3 (3.1) print(“Some text”)输出到UTF-8中,或者如何输出原始字节?

Test.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)

Output (in CP1257 and I replaced chars to byte values [x00]):

输出(在CP1257中,我将chars替换为字节值[x50]):

utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]  
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding.

打印太聪明了……(因为它总是只显示字节的表示而不是真正的字节),而且它根本不可能输出字节,因为无论如何,都要在sys.stdout.编码中对它进行编码。

For example: print(chr(255)) throws an error:

例如:print(chr(255))抛出一个错误:

Traceback (most recent call last):
  File "Test.py", line 1, in <module>
    print(chr(255));
  File "H:\Python31\lib\encodings\cp1257.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xff' in position 0: character maps to <undefined>

By the way print( TestText == TestText2.decode("utf8")) returns False, although print output is the same.

打印(TestText == TestText2.decode(“utf8”))返回False,尽管打印输出是相同的。


How does Python 3 determine sys.stdout.encoding and how can I change it?

Python 3如何确定sys.stdout。编码和如何改变它?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):

我创建了一个可以正常工作的printRAW()函数(实际上它将输出编码为UTF-8,所以实际上它不是原始的…):

 def printRAW(*Text):
     RAWOut = open(1, 'w', encoding='utf8', closefd=False)
     print(*Text, file=RAWOut)
     RAWOut.flush()
     RAWOut.close()

 printRAW("Cool", TestText)

Output (now it print in UTF-8):

输出(现在用UTF-8打印):

Cool Test - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC]) and without errors :)

printRAW(chr(252))也很好地输出了u(在UTF-8, [xC3][xBC]),没有错误:)

Now I'm looking for maybe better solution if there's any...

现在我正在寻找更好的解决方案,如果有的话……

2 个解决方案

#1


42  

First, a correction:

首先,更正:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this NOT utf-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # THIS is "just bytes" in UTF-8.

Now, to send UTF-8 to stdout, regardless of the console's encoding, use the right tool for the job:

现在,为了将UTF-8发送到stdout,不管控制台的编码如何,请使用正确的工具:

import sys
sys.stdout.buffer.write(TestText2)

"buffer" is a raw interface to stdout.

“缓冲区”是stdout的原始接口。

#2


13  

This is the best I can dope out from the manual, and it's a bit of a dirty hack:

这是我能从手册中找到的最好的方法,而且它有点脏。

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print(whatever, file=utf8stdout)

It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.

似乎文件对象应该有一种方法来改变它们的编码,但是AFAICT并没有这样的方法。

If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.

如果你写信给utf8stdout,然后写信给sys。如果不调用utf8stdout.flush(),或者相反,可能会发生不好的事情。

#1


42  

First, a correction:

首先,更正:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this NOT utf-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # THIS is "just bytes" in UTF-8.

Now, to send UTF-8 to stdout, regardless of the console's encoding, use the right tool for the job:

现在,为了将UTF-8发送到stdout,不管控制台的编码如何,请使用正确的工具:

import sys
sys.stdout.buffer.write(TestText2)

"buffer" is a raw interface to stdout.

“缓冲区”是stdout的原始接口。

#2


13  

This is the best I can dope out from the manual, and it's a bit of a dirty hack:

这是我能从手册中找到的最好的方法,而且它有点脏。

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print(whatever, file=utf8stdout)

It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.

似乎文件对象应该有一种方法来改变它们的编码,但是AFAICT并没有这样的方法。

If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.

如果你写信给utf8stdout,然后写信给sys。如果不调用utf8stdout.flush(),或者相反,可能会发生不好的事情。