python 3.3:结构体。包不会接受字符串

时间:2022-01-30 18:17:28

I'm trying to use struct.pack to write a padded string to a file but it seems with the 3.x interpreters this doesn't work anymore. An example of how I'm using it:

我试着使用struct。将一个padd字符串写入文件,但它似乎与3相同。x的译员这已经行不通了。我如何使用它的一个例子:

mystring = anotherString+" sometext here"
output = struct.pack("30s", mystring);

This seems to be okay in earlier versions of python but with 3 it produces an error demanding a byte object. The docs seem to imply that it supposed to do a conversion of any string to a UTF-8 byte object without complaint (and I don't care if a multi-byte character happens to be truncated):

在python的早期版本中,这似乎是可以的,但是对于3,它会产生一个要求字节对象的错误。文档似乎暗示它应该在没有抱怨的情况下将任何字符串转换成UTF-8字节对象(而且我不关心是否会截断多字节字符):

http://docs.python.org/release/3.1.5/library/struct.html: "The c, s and p conversion codes operate on bytes objects, but packing with such codes also supports str objects, which are encoded using UTF-8."

“c、s和p转换代码对字节对象进行操作,但是使用这些代码打包也支持使用UTF-8编码的str对象。”

Am I misreading the docs and how are others using struct.pack with strings?

我是否误读了文档,其他人如何使用struct。与字符串包?

2 个解决方案

#1


10  

Yes, up until 3.1 struct.pack() erroneously would implicitly encode strings to UTF-8 bytes; this was fixed in Python 3.2. See issue 10783.

是的,直到3.1 struct.pack()错误地将字符串隐式编码到UTF-8字节;这是在Python 3.2中修复的。看到发行10783。

The conclusion was that the implicit conversion was a Bad Idea, and it was reverted while the developers still had a chance to do so:

结论是,隐式转换是一个坏主意,在开发人员仍然有机会这样做时,它被还原了:

I prefer to break the API today than having to maintain a broken API for 10 or 20 years :-) And we have a very small user base using Python 3, it's easier to change it now, than in the next release.

我现在更喜欢打破API,而不是在10年或20年的时间内维护一个破损的API:-),我们有一个非常小的用户基础,使用Python 3,现在更改它比在下一个版本中更容易。

This is also documented in the porting section of the 3.2 What's New guide:

这也被记录在3.2新指南的移植部分:

struct.pack() now only allows bytes for the s string pack code. Formerly, it would accept text arguments and implicitly encode them to bytes using UTF-8. This was problematic because it made assumptions about the correct encoding and because a variable-length encoding can fail when writing to fixed length segment of a structure.

pack()现在只允许s字符串包代码的字节。以前,它会接受文本参数,并使用UTF-8隐式地将它们编码为字节。这是有问题的,因为它对正确的编码做出了假设,并且因为当写入固定长度的结构段时,可变长度编码可能会失败。

You need to explicitly encode your strings before packing.

在打包之前,您需要显式地对字符串进行编码。

#2


0  

I could be wrong but in this case won't .encode('UTF-8') work? eg:

我可能是错的,但在这种情况下不会。编码('UTF-8')工作吗?例如:

output = struct.pack("30s", mystring.encode('UTF-8'));

I stand to be corrected.

我愿意被纠正。

#1


10  

Yes, up until 3.1 struct.pack() erroneously would implicitly encode strings to UTF-8 bytes; this was fixed in Python 3.2. See issue 10783.

是的,直到3.1 struct.pack()错误地将字符串隐式编码到UTF-8字节;这是在Python 3.2中修复的。看到发行10783。

The conclusion was that the implicit conversion was a Bad Idea, and it was reverted while the developers still had a chance to do so:

结论是,隐式转换是一个坏主意,在开发人员仍然有机会这样做时,它被还原了:

I prefer to break the API today than having to maintain a broken API for 10 or 20 years :-) And we have a very small user base using Python 3, it's easier to change it now, than in the next release.

我现在更喜欢打破API,而不是在10年或20年的时间内维护一个破损的API:-),我们有一个非常小的用户基础,使用Python 3,现在更改它比在下一个版本中更容易。

This is also documented in the porting section of the 3.2 What's New guide:

这也被记录在3.2新指南的移植部分:

struct.pack() now only allows bytes for the s string pack code. Formerly, it would accept text arguments and implicitly encode them to bytes using UTF-8. This was problematic because it made assumptions about the correct encoding and because a variable-length encoding can fail when writing to fixed length segment of a structure.

pack()现在只允许s字符串包代码的字节。以前,它会接受文本参数,并使用UTF-8隐式地将它们编码为字节。这是有问题的,因为它对正确的编码做出了假设,并且因为当写入固定长度的结构段时,可变长度编码可能会失败。

You need to explicitly encode your strings before packing.

在打包之前,您需要显式地对字符串进行编码。

#2


0  

I could be wrong but in this case won't .encode('UTF-8') work? eg:

我可能是错的,但在这种情况下不会。编码('UTF-8')工作吗?例如:

output = struct.pack("30s", mystring.encode('UTF-8'));

I stand to be corrected.

我愿意被纠正。