This question already has an answer here:
这个问题已经有了答案:
- Python, Unicode, and the Windows console 11 answers
- Python、Unicode和Windows控制台11回答
I'm working on a python application that can print text in multiple languages to the console in multiple platforms. The program works well on all UNIX platforms, but in windows there are errors printing unicode strings in command-line.
我正在开发一个python应用程序,它可以在多个平台上将多种语言的文本打印到控制台。该程序在所有UNIX平台上运行良好,但是在windows中,在命令行中打印unicode字符串会出现错误。
There's already a relevant thread regarding this: ( Windows cmd encoding change causes Python crash ) but I couldn't find my specific answer there.
关于这个问题已经有了相关的线索:(Windows cmd编码更改导致了Python崩溃),但是我在那里找不到我的具体答案。
For example, for the following Asian text, in Linux, I can run:
例如,对于以下的亚洲文本,在Linux中,我可以运行:
>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8")
引起的或
But in windows I get:
但在windows系统中,我得到:
>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8")
σ╝ץΦ╡╖τתהµטצ
I succeeded displaying the correct text with a message box when doing something like that:
当我做这样的事情时,我成功地用消息框显示了正确的文本:
>>> file("bla.vbs", "w").write(u'MsgBox "\u5f15\u8d77\u7684\u6216", 4, "MyTitle"'.encode("utf-16"))
>>> os.system("cscript //U //NoLogo bla.vbs")
But, I want to be able to do it in windows console, and preferably - without requiring too much configuration outside my python code (because my application will be distributed to many hosts).
但是,我希望能够在windows控制台中这样做,并且最好——不需要在python代码之外进行太多的配置(因为我的应用程序将分布到许多主机上)。
Is this possible?
这是可能的吗?
Edit: If it's not possible - I would be happy to accept some other suggestions of writing a console application in windows that displays unicode, e.g. a python implementation of an alternative windows console
编辑:如果不可能的话——我很乐意接受在windows中编写一个显示unicode的控制台应用程序的其他建议,例如python实现的另一个windows控制台
5 个解决方案
#1
2
There's a WriteConsoleW solution that provides a unicode argv and stdout (print) but not stdin: Windows cmd encoding change causes Python crash
有一个WriteConsoleW解决方案提供unicode argv和stdout(打印),但不提供stdin: Windows cmd编码更改导致Python崩溃
The only thing I modified is sys.argv to keep it unicode. The original version utf-8 encoded it for some reason.
我唯一修改的是sys。保持它的unicode。原始版本utf-8出于某种原因对其进行了编码。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
""" https://*.com/questions/878972/windows-cmd-encoding-change-causes-python-crash#answer-3259271
"""
import sys
if sys.platform == "win32":
import codecs
from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID
original_stderr = sys.stderr
# If any exception occurs in this code, we'll probably try to print it on stderr,
# which makes for frustrating debugging if stderr is directed to our wrapper.
# So be paranoid about catching errors and reporting them to original_stderr,
# so that we can at least see them.
def _complain(message):
print >>original_stderr, message if isinstance(message, str) else repr(message)
# Work around <http://bugs.python.org/issue6058>.
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
# Make Unicode console output work independently of the current code page.
# This also fixes <http://bugs.python.org/issue1602>.
# Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx>
# and TZOmegaTZIOY
# <https://*.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>.
try:
# <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>
# HANDLE WINAPI GetStdHandle(DWORD nStdHandle);
# returns INVALID_HANDLE_VALUE, NULL, or a valid handle
#
# <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx>
# DWORD WINAPI GetFileType(DWORD hFile);
#
# <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx>
# BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode);
GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32))
STD_OUTPUT_HANDLE = DWORD(-11)
STD_ERROR_HANDLE = DWORD(-12)
GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32))
FILE_TYPE_CHAR = 0x0002
FILE_TYPE_REMOTE = 0x8000
GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32))
INVALID_HANDLE_VALUE = DWORD(-1).value
def not_a_console(handle):
if handle == INVALID_HANDLE_VALUE or handle is None:
return True
return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR
or GetConsoleMode(handle, byref(DWORD())) == 0)
old_stdout_fileno = None
old_stderr_fileno = None
if hasattr(sys.stdout, 'fileno'):
old_stdout_fileno = sys.stdout.fileno()
if hasattr(sys.stderr, 'fileno'):
old_stderr_fileno = sys.stderr.fileno()
STDOUT_FILENO = 1
STDERR_FILENO = 2
real_stdout = (old_stdout_fileno == STDOUT_FILENO)
real_stderr = (old_stderr_fileno == STDERR_FILENO)
if real_stdout:
hStdout = GetStdHandle(STD_OUTPUT_HANDLE)
if not_a_console(hStdout):
real_stdout = False
if real_stderr:
hStderr = GetStdHandle(STD_ERROR_HANDLE)
if not_a_console(hStderr):
real_stderr = False
if real_stdout or real_stderr:
# BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,
# LPDWORD lpCharsWritten, LPVOID lpReserved);
WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32))
class UnicodeOutput:
def __init__(self, hConsole, stream, fileno, name):
self._hConsole = hConsole
self._stream = stream
self._fileno = fileno
self.closed = False
self.softspace = False
self.mode = 'w'
self.encoding = 'utf-8'
self.name = name
self.flush()
def isatty(self):
return False
def close(self):
# don't really close the handle, that would only cause problems
self.closed = True
def fileno(self):
return self._fileno
def flush(self):
if self._hConsole is None:
try:
self._stream.flush()
except Exception as e:
_complain("%s.flush: %r from %r" % (self.name, e, self._stream))
raise
def write(self, text):
try:
if self._hConsole is None:
if isinstance(text, unicode):
text = text.encode('utf-8')
self._stream.write(text)
else:
if not isinstance(text, unicode):
text = str(text).decode('utf-8')
remaining = len(text)
while remaining:
n = DWORD(0)
# There is a shorter-than-documented limitation on the
# length of the string passed to WriteConsoleW (see
# <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>.
retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None)
if retval == 0 or n.value == 0:
raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value))
remaining -= n.value
if not remaining:
break
text = text[n.value:]
except Exception as e:
_complain("%s.write: %r" % (self.name, e))
raise
def writelines(self, lines):
try:
for line in lines:
self.write(line)
except Exception as e:
_complain("%s.writelines: %r" % (self.name, e))
raise
if real_stdout:
sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>')
else:
sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>')
if real_stderr:
sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>')
else:
sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>')
except Exception as e:
_complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,))
# While we're at it, let's unmangle the command-line arguments:
# This works around <http://bugs.python.org/issue2128>.
GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32))
argc = c_int(0)
argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))
argv = [argv_unicode[i] for i in xrange(0, argc.value)]
# argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)]
if not hasattr(sys, 'frozen'):
# If this is an executable produced by py2exe or bbfreeze, then it will
# have been invoked directly. Otherwise, unicode_argv[0] is the Python
# interpreter, so skip that.
argv = argv[1:]
# Also skip option arguments to the Python interpreter.
while len(argv) > 0:
arg = argv[0]
if not arg.startswith(u"-") or arg == u"-":
break
argv = argv[1:]
if arg == u'-m':
# sys.argv[0] should really be the absolute path of the module source,
# but never mind
break
if arg == u'-c':
argv[0] = u'-c'
break
# if you like:
sys.argv = argv
#2
1
Use a different console program. The following works in mintty, the default terminal emulator in Cygwin.
使用不同的控制台程序。下面是mintty, Cygwin中的默认终端模拟器。
>>> print u"\u5f15\u8d77\u7684\u6216"
引起的或
There are other console alternatives available for Windows but I have not assessed their Unicode support.
Windows还有其他控制台选项,但我还没有评估它们对Unicode的支持。
#3
0
It merely comes from that cmd and powershell consoel do not support variable-width fonts. Fixed fonts do not have Chinese script included. Cygwin is in the same case.
Putty is more advanced, supporting variable-width fonts with cyrillic, vietnamese, arabic scripts, but no chinese so far.
它仅仅来自于cmd和powershell consoel不支持变宽字体。固定字体没有包含中文脚本。Cygwin也是如此。Putty更加先进,支持西里尔字体、越南字体、阿拉伯字体,但目前还没有中文字体。
HTH
HTH
#4
-2
Can you try using the program iconv
on Windows, and piping your Python output through it? It'd go something like this:
您能否尝试在Windows上使用program iconv,并通过它输出Python ?大概是这样的:
python foo.py | iconv -f utf-8 -t utf-16
You might have to do a little work to get iconv
on Windows--it's part of Cygwin but you may be able to build it separately somehow if needed.
你可能需要做一些工作才能在Windows上安装iconv——它是Cygwin的一部分,但是如果需要的话,你可以单独构建它。
#5
-3
The question is answered in the PrintFails article.
在打印失败的文章中回答了这个问题。
By default, the console in Microsoft Windows only displays 256 characters (cp437, of Code page 437, the original IBM-PC 1981 extended ASCII character set.)
默认情况下,Microsoft Windows中的控制台仅显示256个字符(cp437,代码页437,原始IBM-PC 1981扩展ASCII字符集)。
For Russia this means CP866, other countries use their own codepages too. This means that to read Python output in Windows console correctly you should have windows configuration with native codepage configured to display printed symbols.
对俄罗斯来说,这意味着CP866,其他国家也使用自己的代码页。这意味着要正确地在Windows控制台中读取Python输出,您应该具有Windows配置,并将本机代码页配置为显示打印的符号。
I suggest you to always print Unicode text without any encoding to ensure maximum compatibility with various platforms.
我建议您始终打印Unicode文本,不进行任何编码,以确保与各种平台的最大兼容性。
If you try to print unprintable character you will get UnicodeEncodeError or see distorted text.
如果您尝试打印不能打印的字符,您将得到UnicodeEncodeError或看到扭曲的文本。
In some cases, if Python fails to determine output encoding correctly you might try to set PYTHONIOENCODING environment variable, do note however, that this probably won't work for your example, as your console is unable to present Asian text in current configuration.
在某些情况下,如果Python不能正确地确定输出编码,您可以尝试设置PYTHONIOENCODING环境变量,但是请注意,这在您的示例中可能是行不通的,因为您的控制台无法在当前配置中显示亚洲文本。
To reconfigure console use Control Panel->Language and Regional settings->Advanced(tab)->Non Unicode programs language(section). Note that menu names are translated by me from Russian.
要重新配置控制台,请使用控制面板—>语言和区域设置—>高级(tab)—>非Unicode程序语言(section)。注意菜单名是我从俄语翻译过来的。
See also answers for the very similar question.
同样的问题也有答案。
#1
2
There's a WriteConsoleW solution that provides a unicode argv and stdout (print) but not stdin: Windows cmd encoding change causes Python crash
有一个WriteConsoleW解决方案提供unicode argv和stdout(打印),但不提供stdin: Windows cmd编码更改导致Python崩溃
The only thing I modified is sys.argv to keep it unicode. The original version utf-8 encoded it for some reason.
我唯一修改的是sys。保持它的unicode。原始版本utf-8出于某种原因对其进行了编码。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
""" https://*.com/questions/878972/windows-cmd-encoding-change-causes-python-crash#answer-3259271
"""
import sys
if sys.platform == "win32":
import codecs
from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID
original_stderr = sys.stderr
# If any exception occurs in this code, we'll probably try to print it on stderr,
# which makes for frustrating debugging if stderr is directed to our wrapper.
# So be paranoid about catching errors and reporting them to original_stderr,
# so that we can at least see them.
def _complain(message):
print >>original_stderr, message if isinstance(message, str) else repr(message)
# Work around <http://bugs.python.org/issue6058>.
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
# Make Unicode console output work independently of the current code page.
# This also fixes <http://bugs.python.org/issue1602>.
# Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx>
# and TZOmegaTZIOY
# <https://*.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>.
try:
# <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>
# HANDLE WINAPI GetStdHandle(DWORD nStdHandle);
# returns INVALID_HANDLE_VALUE, NULL, or a valid handle
#
# <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx>
# DWORD WINAPI GetFileType(DWORD hFile);
#
# <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx>
# BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode);
GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32))
STD_OUTPUT_HANDLE = DWORD(-11)
STD_ERROR_HANDLE = DWORD(-12)
GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32))
FILE_TYPE_CHAR = 0x0002
FILE_TYPE_REMOTE = 0x8000
GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32))
INVALID_HANDLE_VALUE = DWORD(-1).value
def not_a_console(handle):
if handle == INVALID_HANDLE_VALUE or handle is None:
return True
return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR
or GetConsoleMode(handle, byref(DWORD())) == 0)
old_stdout_fileno = None
old_stderr_fileno = None
if hasattr(sys.stdout, 'fileno'):
old_stdout_fileno = sys.stdout.fileno()
if hasattr(sys.stderr, 'fileno'):
old_stderr_fileno = sys.stderr.fileno()
STDOUT_FILENO = 1
STDERR_FILENO = 2
real_stdout = (old_stdout_fileno == STDOUT_FILENO)
real_stderr = (old_stderr_fileno == STDERR_FILENO)
if real_stdout:
hStdout = GetStdHandle(STD_OUTPUT_HANDLE)
if not_a_console(hStdout):
real_stdout = False
if real_stderr:
hStderr = GetStdHandle(STD_ERROR_HANDLE)
if not_a_console(hStderr):
real_stderr = False
if real_stdout or real_stderr:
# BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,
# LPDWORD lpCharsWritten, LPVOID lpReserved);
WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32))
class UnicodeOutput:
def __init__(self, hConsole, stream, fileno, name):
self._hConsole = hConsole
self._stream = stream
self._fileno = fileno
self.closed = False
self.softspace = False
self.mode = 'w'
self.encoding = 'utf-8'
self.name = name
self.flush()
def isatty(self):
return False
def close(self):
# don't really close the handle, that would only cause problems
self.closed = True
def fileno(self):
return self._fileno
def flush(self):
if self._hConsole is None:
try:
self._stream.flush()
except Exception as e:
_complain("%s.flush: %r from %r" % (self.name, e, self._stream))
raise
def write(self, text):
try:
if self._hConsole is None:
if isinstance(text, unicode):
text = text.encode('utf-8')
self._stream.write(text)
else:
if not isinstance(text, unicode):
text = str(text).decode('utf-8')
remaining = len(text)
while remaining:
n = DWORD(0)
# There is a shorter-than-documented limitation on the
# length of the string passed to WriteConsoleW (see
# <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>.
retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None)
if retval == 0 or n.value == 0:
raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value))
remaining -= n.value
if not remaining:
break
text = text[n.value:]
except Exception as e:
_complain("%s.write: %r" % (self.name, e))
raise
def writelines(self, lines):
try:
for line in lines:
self.write(line)
except Exception as e:
_complain("%s.writelines: %r" % (self.name, e))
raise
if real_stdout:
sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>')
else:
sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>')
if real_stderr:
sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>')
else:
sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>')
except Exception as e:
_complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,))
# While we're at it, let's unmangle the command-line arguments:
# This works around <http://bugs.python.org/issue2128>.
GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32))
argc = c_int(0)
argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))
argv = [argv_unicode[i] for i in xrange(0, argc.value)]
# argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)]
if not hasattr(sys, 'frozen'):
# If this is an executable produced by py2exe or bbfreeze, then it will
# have been invoked directly. Otherwise, unicode_argv[0] is the Python
# interpreter, so skip that.
argv = argv[1:]
# Also skip option arguments to the Python interpreter.
while len(argv) > 0:
arg = argv[0]
if not arg.startswith(u"-") or arg == u"-":
break
argv = argv[1:]
if arg == u'-m':
# sys.argv[0] should really be the absolute path of the module source,
# but never mind
break
if arg == u'-c':
argv[0] = u'-c'
break
# if you like:
sys.argv = argv
#2
1
Use a different console program. The following works in mintty, the default terminal emulator in Cygwin.
使用不同的控制台程序。下面是mintty, Cygwin中的默认终端模拟器。
>>> print u"\u5f15\u8d77\u7684\u6216"
引起的或
There are other console alternatives available for Windows but I have not assessed their Unicode support.
Windows还有其他控制台选项,但我还没有评估它们对Unicode的支持。
#3
0
It merely comes from that cmd and powershell consoel do not support variable-width fonts. Fixed fonts do not have Chinese script included. Cygwin is in the same case.
Putty is more advanced, supporting variable-width fonts with cyrillic, vietnamese, arabic scripts, but no chinese so far.
它仅仅来自于cmd和powershell consoel不支持变宽字体。固定字体没有包含中文脚本。Cygwin也是如此。Putty更加先进,支持西里尔字体、越南字体、阿拉伯字体,但目前还没有中文字体。
HTH
HTH
#4
-2
Can you try using the program iconv
on Windows, and piping your Python output through it? It'd go something like this:
您能否尝试在Windows上使用program iconv,并通过它输出Python ?大概是这样的:
python foo.py | iconv -f utf-8 -t utf-16
You might have to do a little work to get iconv
on Windows--it's part of Cygwin but you may be able to build it separately somehow if needed.
你可能需要做一些工作才能在Windows上安装iconv——它是Cygwin的一部分,但是如果需要的话,你可以单独构建它。
#5
-3
The question is answered in the PrintFails article.
在打印失败的文章中回答了这个问题。
By default, the console in Microsoft Windows only displays 256 characters (cp437, of Code page 437, the original IBM-PC 1981 extended ASCII character set.)
默认情况下,Microsoft Windows中的控制台仅显示256个字符(cp437,代码页437,原始IBM-PC 1981扩展ASCII字符集)。
For Russia this means CP866, other countries use their own codepages too. This means that to read Python output in Windows console correctly you should have windows configuration with native codepage configured to display printed symbols.
对俄罗斯来说,这意味着CP866,其他国家也使用自己的代码页。这意味着要正确地在Windows控制台中读取Python输出,您应该具有Windows配置,并将本机代码页配置为显示打印的符号。
I suggest you to always print Unicode text without any encoding to ensure maximum compatibility with various platforms.
我建议您始终打印Unicode文本,不进行任何编码,以确保与各种平台的最大兼容性。
If you try to print unprintable character you will get UnicodeEncodeError or see distorted text.
如果您尝试打印不能打印的字符,您将得到UnicodeEncodeError或看到扭曲的文本。
In some cases, if Python fails to determine output encoding correctly you might try to set PYTHONIOENCODING environment variable, do note however, that this probably won't work for your example, as your console is unable to present Asian text in current configuration.
在某些情况下,如果Python不能正确地确定输出编码,您可以尝试设置PYTHONIOENCODING环境变量,但是请注意,这在您的示例中可能是行不通的,因为您的控制台无法在当前配置中显示亚洲文本。
To reconfigure console use Control Panel->Language and Regional settings->Advanced(tab)->Non Unicode programs language(section). Note that menu names are translated by me from Russian.
要重新配置控制台,请使用控制面板—>语言和区域设置—>高级(tab)—>非Unicode程序语言(section)。注意菜单名是我从俄语翻译过来的。
See also answers for the very similar question.
同样的问题也有答案。