According to the documentation, in Python 2.7.3, shlex should support UNICODE. However, when running the code below, I get: UnicodeEncodeError: 'ascii' codec can't encode characters in position 184-189: ordinal not in range(128)
根据文档,在Python 2.7.3中,shlex应该支持UNICODE。但是,当运行下面的代码时,我得到了:UnicodeEncodeError:“ascii”编解码器不能对位置184-189的字符进行编码:序数不在范围(128)
Am I doing something wrong?
我做错什么了吗?
import shlex
command_full = u'software.py -fileA="sequence.fasta" -fileB="新建文本文档.fasta.txt" -output_dir="..." -FORMtitle="tst"'
shlex.split(command_full)
The exact error is following:
准确的误差如下:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 275, in split
lex = shlex(s, posix=posix)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 25, in __init__
instream = StringIO(instream)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 44-49: ordinal not in range(128)
This is output from my mac using python from macports. I am getting exactly the same error on Ubuntu machine with "native" python 2.7.3.
这是我的mac从macports使用python的输出。我在Ubuntu机器上得到了与“原生”python 2.7.3相同的错误。
2 个解决方案
#1
11
The shlex.split()
code wraps both unicode()
and str()
instances in a StringIO()
object, which can only handle Latin-1 bytes (so not the full unicode codepoint range).
split()代码在StringIO()对象中封装了unicode()和str()实例,该对象只能处理Latin-1字节(所以不是完整的unicode codepoint范围)。
You'll have to encode (to UTF-8 should work) if you still want to use shlex.split()
; the maintainers of the module meant that unicode()
objects are supported now, just not anything outside the Latin-1 range of codepoints.
如果您仍然想使用shlex.split(),则必须对(UTF-8)编码。模块的维护者意味着现在支持unicode()对象,而不是在Latin-1的codepoints范围之外。
Encoding, splitting, decoding gives me:
编码,分裂,解码给我:
>>> map(lambda s: s.decode('UTF8'), shlex.split(command_full.encode('utf8')))
[u'software.py', u'-fileA=sequence.fasta', u'-fileB=\u65b0\u5efa\u6587\u672c\u6587\u6863.fasta.txt', u'-output_dir=...', u'-FORMtitle=tst']
A now closed Python issue tried to address this, but the module is very byte-stream oriented, and no new patch has materialized. For now using iso-8859-1
or UTF-8
encoding is the best I can come up with for you.
现在,一个封闭的Python问题试图解决这个问题,但是这个模块是面向字节流的,没有出现新的补丁。现在使用iso-8859-1或UTF-8编码是我能为您提供的最好的编码。
#2
2
Actually there's been a patch for over five years. Last year I got tired of copying a ushlex around in every project and put it on PyPI:
实际上已经有五年多的时间了。去年,我厌倦了在每一个项目中都复制一个“ushlex”,并把它放在PyPI上:
https://pypi.python.org/pypi/ushlex/
https://pypi.python.org/pypi/ushlex/
#1
11
The shlex.split()
code wraps both unicode()
and str()
instances in a StringIO()
object, which can only handle Latin-1 bytes (so not the full unicode codepoint range).
split()代码在StringIO()对象中封装了unicode()和str()实例,该对象只能处理Latin-1字节(所以不是完整的unicode codepoint范围)。
You'll have to encode (to UTF-8 should work) if you still want to use shlex.split()
; the maintainers of the module meant that unicode()
objects are supported now, just not anything outside the Latin-1 range of codepoints.
如果您仍然想使用shlex.split(),则必须对(UTF-8)编码。模块的维护者意味着现在支持unicode()对象,而不是在Latin-1的codepoints范围之外。
Encoding, splitting, decoding gives me:
编码,分裂,解码给我:
>>> map(lambda s: s.decode('UTF8'), shlex.split(command_full.encode('utf8')))
[u'software.py', u'-fileA=sequence.fasta', u'-fileB=\u65b0\u5efa\u6587\u672c\u6587\u6863.fasta.txt', u'-output_dir=...', u'-FORMtitle=tst']
A now closed Python issue tried to address this, but the module is very byte-stream oriented, and no new patch has materialized. For now using iso-8859-1
or UTF-8
encoding is the best I can come up with for you.
现在,一个封闭的Python问题试图解决这个问题,但是这个模块是面向字节流的,没有出现新的补丁。现在使用iso-8859-1或UTF-8编码是我能为您提供的最好的编码。
#2
2
Actually there's been a patch for over five years. Last year I got tired of copying a ushlex around in every project and put it on PyPI:
实际上已经有五年多的时间了。去年,我厌倦了在每一个项目中都复制一个“ushlex”,并把它放在PyPI上:
https://pypi.python.org/pypi/ushlex/
https://pypi.python.org/pypi/ushlex/