I'm trying to use struct.pack to write a padded string to a file but it seems with the 3.x interpreters this doesn't work anymore. An example of how I'm using it:
我试着使用struct。将一个padd字符串写入文件,但它似乎与3相同。x的译员这已经行不通了。我如何使用它的一个例子:
mystring = anotherString+" sometext here"
output = struct.pack("30s", mystring);
This seems to be okay in earlier versions of python but with 3 it produces an error demanding a byte object. The docs seem to imply that it supposed to do a conversion of any string to a UTF-8 byte object without complaint (and I don't care if a multi-byte character happens to be truncated):
在python的早期版本中,这似乎是可以的,但是对于3,它会产生一个要求字节对象的错误。文档似乎暗示它应该在没有抱怨的情况下将任何字符串转换成UTF-8字节对象(而且我不关心是否会截断多字节字符):
http://docs.python.org/release/3.1.5/library/struct.html: "The c, s and p conversion codes operate on bytes objects, but packing with such codes also supports str objects, which are encoded using UTF-8."
“c、s和p转换代码对字节对象进行操作,但是使用这些代码打包也支持使用UTF-8编码的str对象。”
Am I misreading the docs and how are others using struct.pack with strings?
我是否误读了文档,其他人如何使用struct。与字符串包?
2 个解决方案
#1
10
Yes, up until 3.1 struct.pack()
erroneously would implicitly encode strings to UTF-8 bytes; this was fixed in Python 3.2. See issue 10783.
是的,直到3.1 struct.pack()错误地将字符串隐式编码到UTF-8字节;这是在Python 3.2中修复的。看到发行10783。
The conclusion was that the implicit conversion was a Bad Idea, and it was reverted while the developers still had a chance to do so:
结论是,隐式转换是一个坏主意,在开发人员仍然有机会这样做时,它被还原了:
I prefer to break the API today than having to maintain a broken API for 10 or 20 years :-) And we have a very small user base using Python 3, it's easier to change it now, than in the next release.
我现在更喜欢打破API,而不是在10年或20年的时间内维护一个破损的API:-),我们有一个非常小的用户基础,使用Python 3,现在更改它比在下一个版本中更容易。
This is also documented in the porting section of the 3.2 What's New guide:
这也被记录在3.2新指南的移植部分:
struct.pack()
now only allows bytes for thes
string pack code. Formerly, it would accept text arguments and implicitly encode them to bytes using UTF-8. This was problematic because it made assumptions about the correct encoding and because a variable-length encoding can fail when writing to fixed length segment of a structure.pack()现在只允许s字符串包代码的字节。以前,它会接受文本参数,并使用UTF-8隐式地将它们编码为字节。这是有问题的,因为它对正确的编码做出了假设,并且因为当写入固定长度的结构段时,可变长度编码可能会失败。
You need to explicitly encode your strings before packing.
在打包之前,您需要显式地对字符串进行编码。
#2
0
I could be wrong but in this case won't .encode('UTF-8')
work? eg:
我可能是错的,但在这种情况下不会。编码('UTF-8')工作吗?例如:
output = struct.pack("30s", mystring.encode('UTF-8'));
I stand to be corrected.
我愿意被纠正。
#1
10
Yes, up until 3.1 struct.pack()
erroneously would implicitly encode strings to UTF-8 bytes; this was fixed in Python 3.2. See issue 10783.
是的,直到3.1 struct.pack()错误地将字符串隐式编码到UTF-8字节;这是在Python 3.2中修复的。看到发行10783。
The conclusion was that the implicit conversion was a Bad Idea, and it was reverted while the developers still had a chance to do so:
结论是,隐式转换是一个坏主意,在开发人员仍然有机会这样做时,它被还原了:
I prefer to break the API today than having to maintain a broken API for 10 or 20 years :-) And we have a very small user base using Python 3, it's easier to change it now, than in the next release.
我现在更喜欢打破API,而不是在10年或20年的时间内维护一个破损的API:-),我们有一个非常小的用户基础,使用Python 3,现在更改它比在下一个版本中更容易。
This is also documented in the porting section of the 3.2 What's New guide:
这也被记录在3.2新指南的移植部分:
struct.pack()
now only allows bytes for thes
string pack code. Formerly, it would accept text arguments and implicitly encode them to bytes using UTF-8. This was problematic because it made assumptions about the correct encoding and because a variable-length encoding can fail when writing to fixed length segment of a structure.pack()现在只允许s字符串包代码的字节。以前,它会接受文本参数,并使用UTF-8隐式地将它们编码为字节。这是有问题的,因为它对正确的编码做出了假设,并且因为当写入固定长度的结构段时,可变长度编码可能会失败。
You need to explicitly encode your strings before packing.
在打包之前,您需要显式地对字符串进行编码。
#2
0
I could be wrong but in this case won't .encode('UTF-8')
work? eg:
我可能是错的,但在这种情况下不会。编码('UTF-8')工作吗?例如:
output = struct.pack("30s", mystring.encode('UTF-8'));
I stand to be corrected.
我愿意被纠正。