I´am just starting to learn python using LPTHW, and it´s really good. I´am just a couple of days in to my studies and come up to excercise 16 it looks like this:
我刚刚开始学习使用LPTHW的python,这真的很棒。我只是在学习上几天,然后开始练习16它看起来像这样:
-*- coding: utf-8 -*-
from sys import argv
script, filename = argv
print "We're going to erase %r." % filename
print "If you don't want that, hit CTRL-C (^C)."
print "If you do want that, hit RETURN."
raw_input("?")
print "Opening the file..."
target = open(filename, 'w')
print "Truncating the file. Goodbye!"
target.truncate()
print "Now I'm going to ask you for three lines."
line1 = raw_input("line 1: ")
line2 = raw_input("line 2: ")
line3 = raw_input("line 3: ")
print "I'm going to write these to the file."
target.write("%r\n%r\n%r\n" % (line1, line2, line3))
print "And finally, we close it."
target.close()
The problem is that i'm from a country with the letters "Å", "Ä" and "Ö" in the alphabet, but when i am using these letters the output in the file (test.txt) looks something like this: u'hej' u'\xc5je' u'l\xe4get'
问题是我来自一个字母表中带有“Å”,“Ä”和“Ö”字母的国家,但是当我使用这些字母时,文件中的输出(test.txt)看起来像这样:你好吗?xc5je'ou'l \ xe4get'
When i decode a string a can do something like this: "hallå".decode("utf-8")
当我解码一个字符串时,可以做类似这样的事情:“hallå”.decode(“utf-8”)
And it will print just fine
它打印就好了
But i also want the input from a user to be correct, even when using odd characters. I have tried different things that either does not work or gives me errors when running, like for example
但我也希望用户输入正确,即使使用奇数字符也是如此。我尝试过不同的东西,或者不起作用,或者在运行时给我错误,例如
line1 = raw_input("line 1: ").decode("utf-8")
I tried to google my problems but i did´t feel like the answers given was not very straight forward or written for much more experienced users.
我试图谷歌我的问题,但我不觉得给出的答案不是很直接或为更有经验的用户编写。
If someone would take some time to explain the encoding/decoding of unicode characters in a beginner firendly way and give me an example of how i can get it to work i would really appriciate it
如果有人花一些时间以初学者友好的方式解释unicode字符的编码/解码,并给我一个如何让它工作的例子,我会真的很喜欢它
If it helps, iam on Windows 10, running python 2.7.10 and my system locale is set to swedish
如果它有帮助,iam在Windows 10上运行python 2.7.10并且我的系统语言环境设置为瑞典语
4 个解决方案
#1
3
Here's a way to decode stdin. It generally works from the Console but IDEs sometimes replace the stdin object and don't always support the encoding parameter. I also modernized the code a bit, using with
and io.open
to handle encodings. Note that the file will be written in UTF-8, so open it with Notepad to see it correctly. Using type <filename>
from the console will try to display the file with the console's stdout encoding.
这是解码stdin的一种方法。它通常在Console中运行,但IDE有时会替换stdin对象,并不总是支持encoding参数。我还对代码进行了现代化改造,使用with和io.open来处理编码。请注意,该文件将以UTF-8编写,因此请使用记事本将其打开以正确查看。使用控制台中的
#!python2
import sys
import io
script, filename = sys.argv
print "We're going to erase %s." % filename
print "If you don't want that, hit CTRL-C (^C)."
print "If you do want that, hit RETURN."
raw_input("?")
print "Now I'm going to ask you for three lines."
line1 = raw_input("line 1: ").decode(sys.stdin.encoding)
line2 = raw_input("line 2: ").decode(sys.stdin.encoding)
line3 = raw_input("line 3: ").decode(sys.stdin.encoding)
print "I'm going to write these to the file."
with io.open(filename, 'wt', encoding='utf8') as target:
target.write(u"%s\n%s\n%s\n" % (line1, line2, line3))
#2
3
Your output indicates that raw_input()
already accepts Å
, ä
just fine in your environment.
你的输出表明raw_input()已经接受Å,ä就好了。
Either your code does not correspond to the output or your IDE is too helpful. raw_input()
should return str
type (bytes) but the output shows that you're saving text representations of unicode
objects: u'hej' u'\xc5je' u'l\xe4get'
.
您的代码与输出不对应,或者您的IDE太有用了。 raw_input()应返回str类型(字节),但输出显示你正在保存unicode对象的文本表示:u'hej'u'\ xc5je'u'l \ xe4get'。
The smallest code change that would produce your desirable result is using %s
(save string as is) instead of %r
(save its ascii printable representation as returned by repr()
function) in the format string as suggested in @chepner's answer.
产生理想结果的最小代码更改是在@chepner的答案中建议的格式字符串中使用%s(按原样保存字符串)而不是%r(保存由repr()函数返回的ascii可打印表示)。
If someone would take some time to explain the encoding/decoding of unicode characters in a beginner firendly way and give me an example of how i can get it to work i would really appriciate it
如果有人花一些时间以初学者友好的方式解释unicode字符的编码/解码,并给我一个如何让它工作的例子,我会真的很喜欢它
Unicode handling on Python 2 requires understanding of what API returns text and what API returns binary data. Some API use a mixture such as ascii-based network protocols.
Python 2上的Unicode处理需要了解API返回文本以及API返回二进制数据的内容。某些API使用混合,例如基于ascii的网络协议。
Python 2 allows str
type to represent both human-readable text and binary data and it may create confusion. I recommend to start with Python 3 that is more strict for Unicode-related issues.
Python 2允许str类型表示人类可读的文本和二进制数据,它可能会造成混淆。我建议从Python 3开始,这对于Unicode相关问题更严格。
In general, while working with Unicode you should convert encoded text into Unicode on input as soon as possible (e.g., using .decode()
) and convert Unicode text to bytes on output as late as possible. @Mark Tolonen's answer demonstrate this approach:
通常,在使用Unicode时,您应尽快将编码文本转换为Unicode(例如,使用.decode()),并尽可能晚地将Unicode文本转换为输出字节。 @Mark Tolonen的回答证明了这种方法:
- it uses
.decode(sys.stdin.encoding)
to decode bytes returned fromraw_input()
into Unicode text. Ifraw_input()
already returns Unicode in your environment (to checkprint type(raw_input('input something'))
) then you could omit.decode()
call - 它使用.decode(sys.stdin.encoding)将从raw_input()返回的字节解码为Unicode文本。如果raw_input()已经在您的环境中返回Unicode(要检查打印类型(raw_input('input something'))),那么您可以省略.decode()调用
-
io.open(..., encoding='utf-8').write(u'some text')
convert Unicode text to bytes (encodes it using utf-8 encoding). - io.open(...,encoding ='utf-8')。write(u'some text')将Unicode文本转换为字节(使用utf-8编码对其进行编码)。
This general approach is known as Unicode sandwich.
这种通用方法称为Unicode三明治。
.decode(sys.stdin.encoding)
may fail. To support arbitrary Unicode input in Windows console, install win-unicode-console
Python package.
.decode(sys.stdin.encoding)可能会失败。要在Windows控制台中支持任意Unicode输入,请安装win-unicode-console Python包。
#3
1
You're writing a representation of the string, rather than the actual encoded Unicode string, to your file. Use
您正在将字符串的表示形式写入文件,而不是实际编码的Unicode字符串。使用
target.write("%s\n%s\n%s\n" % (line1, line2, line3))
instead.
代替。
#4
0
you can use this format:
你可以使用这种格式:
f = open('file.txt', 'w') s = u'\u221A' f.write(s.encode('utf-8'))
f = open('file.txt','w')s = u'\ u221A'f.write(s.encode('utf-8'))
here: line1 = raw_input("> ").encode('utf-8')
so goes for line2 and line3
这里:line1 = raw_input(“>”)。encode('utf-8')所以对于line2和line3
#1
3
Here's a way to decode stdin. It generally works from the Console but IDEs sometimes replace the stdin object and don't always support the encoding parameter. I also modernized the code a bit, using with
and io.open
to handle encodings. Note that the file will be written in UTF-8, so open it with Notepad to see it correctly. Using type <filename>
from the console will try to display the file with the console's stdout encoding.
这是解码stdin的一种方法。它通常在Console中运行,但IDE有时会替换stdin对象,并不总是支持encoding参数。我还对代码进行了现代化改造,使用with和io.open来处理编码。请注意,该文件将以UTF-8编写,因此请使用记事本将其打开以正确查看。使用控制台中的
#!python2
import sys
import io
script, filename = sys.argv
print "We're going to erase %s." % filename
print "If you don't want that, hit CTRL-C (^C)."
print "If you do want that, hit RETURN."
raw_input("?")
print "Now I'm going to ask you for three lines."
line1 = raw_input("line 1: ").decode(sys.stdin.encoding)
line2 = raw_input("line 2: ").decode(sys.stdin.encoding)
line3 = raw_input("line 3: ").decode(sys.stdin.encoding)
print "I'm going to write these to the file."
with io.open(filename, 'wt', encoding='utf8') as target:
target.write(u"%s\n%s\n%s\n" % (line1, line2, line3))
#2
3
Your output indicates that raw_input()
already accepts Å
, ä
just fine in your environment.
你的输出表明raw_input()已经接受Å,ä就好了。
Either your code does not correspond to the output or your IDE is too helpful. raw_input()
should return str
type (bytes) but the output shows that you're saving text representations of unicode
objects: u'hej' u'\xc5je' u'l\xe4get'
.
您的代码与输出不对应,或者您的IDE太有用了。 raw_input()应返回str类型(字节),但输出显示你正在保存unicode对象的文本表示:u'hej'u'\ xc5je'u'l \ xe4get'。
The smallest code change that would produce your desirable result is using %s
(save string as is) instead of %r
(save its ascii printable representation as returned by repr()
function) in the format string as suggested in @chepner's answer.
产生理想结果的最小代码更改是在@chepner的答案中建议的格式字符串中使用%s(按原样保存字符串)而不是%r(保存由repr()函数返回的ascii可打印表示)。
If someone would take some time to explain the encoding/decoding of unicode characters in a beginner firendly way and give me an example of how i can get it to work i would really appriciate it
如果有人花一些时间以初学者友好的方式解释unicode字符的编码/解码,并给我一个如何让它工作的例子,我会真的很喜欢它
Unicode handling on Python 2 requires understanding of what API returns text and what API returns binary data. Some API use a mixture such as ascii-based network protocols.
Python 2上的Unicode处理需要了解API返回文本以及API返回二进制数据的内容。某些API使用混合,例如基于ascii的网络协议。
Python 2 allows str
type to represent both human-readable text and binary data and it may create confusion. I recommend to start with Python 3 that is more strict for Unicode-related issues.
Python 2允许str类型表示人类可读的文本和二进制数据,它可能会造成混淆。我建议从Python 3开始,这对于Unicode相关问题更严格。
In general, while working with Unicode you should convert encoded text into Unicode on input as soon as possible (e.g., using .decode()
) and convert Unicode text to bytes on output as late as possible. @Mark Tolonen's answer demonstrate this approach:
通常,在使用Unicode时,您应尽快将编码文本转换为Unicode(例如,使用.decode()),并尽可能晚地将Unicode文本转换为输出字节。 @Mark Tolonen的回答证明了这种方法:
- it uses
.decode(sys.stdin.encoding)
to decode bytes returned fromraw_input()
into Unicode text. Ifraw_input()
already returns Unicode in your environment (to checkprint type(raw_input('input something'))
) then you could omit.decode()
call - 它使用.decode(sys.stdin.encoding)将从raw_input()返回的字节解码为Unicode文本。如果raw_input()已经在您的环境中返回Unicode(要检查打印类型(raw_input('input something'))),那么您可以省略.decode()调用
-
io.open(..., encoding='utf-8').write(u'some text')
convert Unicode text to bytes (encodes it using utf-8 encoding). - io.open(...,encoding ='utf-8')。write(u'some text')将Unicode文本转换为字节(使用utf-8编码对其进行编码)。
This general approach is known as Unicode sandwich.
这种通用方法称为Unicode三明治。
.decode(sys.stdin.encoding)
may fail. To support arbitrary Unicode input in Windows console, install win-unicode-console
Python package.
.decode(sys.stdin.encoding)可能会失败。要在Windows控制台中支持任意Unicode输入,请安装win-unicode-console Python包。
#3
1
You're writing a representation of the string, rather than the actual encoded Unicode string, to your file. Use
您正在将字符串的表示形式写入文件,而不是实际编码的Unicode字符串。使用
target.write("%s\n%s\n%s\n" % (line1, line2, line3))
instead.
代替。
#4
0
you can use this format:
你可以使用这种格式:
f = open('file.txt', 'w') s = u'\u221A' f.write(s.encode('utf-8'))
f = open('file.txt','w')s = u'\ u221A'f.write(s.encode('utf-8'))
here: line1 = raw_input("> ").encode('utf-8')
so goes for line2 and line3
这里:line1 = raw_input(“>”)。encode('utf-8')所以对于line2和line3