在Python中显示转义字符串作为Unicode

时间:2021-06-06 20:14:24

i have just known Python for few days. Unicode seems to be a problem with Python.

我才认识Python几天。Unicode似乎是Python的一个问题。

i have a text file stores a text string like this

我有一个文本文件存储这样的文本字符串

'\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1'

i can read the file and print the string out but it displays incorrectly. How can i print it out to screen correctly as follow:

我可以读取文件并输出字符串,但它显示不正确。我怎样才能正确地将它打印到屏幕上,如下所示:

"Đèn đỏ nút giao thông Ngã tư Láng Hạ"

Thanks in advance

谢谢提前

3 个解决方案

#1


8  

>>> x=r'\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1'
>>> u=unicode(x, 'unicode-escape')
>>> print u
Đèn đỏ nút giao thông Ngã tư Láng Hạ

This works in a Mac, where Terminal.App correctly makes sys.stdout.encoding be set to utf-8. If your platform doesn't set that attribute correctly (or at all), you'll need to replace the last line with

这在Mac上是有效的,终端。应用程序正确地使sys.stdout。编码设置为utf-8。如果您的平台没有正确地(或者根本没有)设置该属性,您需要用以下代码替换最后一行

print u.decode('utf8')

or whatever other encoding your terminal/console is using.

或者你的终端/控制台正在使用的任何其他编码。

Note that in the first line I assign a raw string literal so that the "escape sequences" would not be expanded -- that just mimics what would happen if bytestring x was being read from a (text or binary) file with that literal content.

请注意,在第一行中,我指定了一个原始字符串文字,这样“转义序列”就不会被扩展——这只是模拟了如果bytestring x从一个(文本或二进制文件)文件中读取文本内容,会发生什么情况。

#2


1  

It helps to show a simple example with code and output what you have explicitly tried. At a guess your console doesn't support Vietnamese. Here are some options:

它有助于显示一个简单的代码示例,并输出您已经明确尝试过的内容。您的控制台不支持越南人。这里有一些选项:

# A byte string with Unicode escapes as text.
>>> x='\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1'

# Convert to Unicode string.
>>> x=x.decode('unicode-escape')
>>> x
u'\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1'

# Try to print to my console:
>>> print x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\dev\python\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0110' in position 0:
  character maps to <undefined>

# My console's encoding is cp437.
# Instead of the default strict error handling that throws exceptions, try:
>>> print x.encode('cp437','replace')
?èn ?? nút giao thông Ng? t? Láng H?    

# Six characters weren't supported.
# Here's a way to write the text to a temp file and display it with another
# program that supports the UTF-8 encoding:
>>> import tempfile
>>> f,name=tempfile.mkstemp()
>>> import os
>>> os.write(f,x.encode('utf8'))
48
>>> os.close(f)
>>> os.system('notepad.exe '+name)

Hope that helps you.

希望可以帮助你。

#3


0  

Try this

试试这个

>>> s=u"\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1"
>>> print s
=> Đèn đỏ nút giao thông Ngã tư Láng Hạ

#1


8  

>>> x=r'\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1'
>>> u=unicode(x, 'unicode-escape')
>>> print u
Đèn đỏ nút giao thông Ngã tư Láng Hạ

This works in a Mac, where Terminal.App correctly makes sys.stdout.encoding be set to utf-8. If your platform doesn't set that attribute correctly (or at all), you'll need to replace the last line with

这在Mac上是有效的,终端。应用程序正确地使sys.stdout。编码设置为utf-8。如果您的平台没有正确地(或者根本没有)设置该属性,您需要用以下代码替换最后一行

print u.decode('utf8')

or whatever other encoding your terminal/console is using.

或者你的终端/控制台正在使用的任何其他编码。

Note that in the first line I assign a raw string literal so that the "escape sequences" would not be expanded -- that just mimics what would happen if bytestring x was being read from a (text or binary) file with that literal content.

请注意,在第一行中,我指定了一个原始字符串文字,这样“转义序列”就不会被扩展——这只是模拟了如果bytestring x从一个(文本或二进制文件)文件中读取文本内容,会发生什么情况。

#2


1  

It helps to show a simple example with code and output what you have explicitly tried. At a guess your console doesn't support Vietnamese. Here are some options:

它有助于显示一个简单的代码示例,并输出您已经明确尝试过的内容。您的控制台不支持越南人。这里有一些选项:

# A byte string with Unicode escapes as text.
>>> x='\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1'

# Convert to Unicode string.
>>> x=x.decode('unicode-escape')
>>> x
u'\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1'

# Try to print to my console:
>>> print x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\dev\python\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0110' in position 0:
  character maps to <undefined>

# My console's encoding is cp437.
# Instead of the default strict error handling that throws exceptions, try:
>>> print x.encode('cp437','replace')
?èn ?? nút giao thông Ng? t? Láng H?    

# Six characters weren't supported.
# Here's a way to write the text to a temp file and display it with another
# program that supports the UTF-8 encoding:
>>> import tempfile
>>> f,name=tempfile.mkstemp()
>>> import os
>>> os.write(f,x.encode('utf8'))
48
>>> os.close(f)
>>> os.system('notepad.exe '+name)

Hope that helps you.

希望可以帮助你。

#3


0  

Try this

试试这个

>>> s=u"\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1"
>>> print s
=> Đèn đỏ nút giao thông Ngã tư Láng Hạ