Let's say, I have a string (unicode if it matters) variable which is less than 100 bytes. I want to create another variable with exactly 100 byte in size which includes this string and is padded with zero or whatever. How would I do it in Python 3?
比方说,我有一个小于100字节的字符串(如果重要的话是unicode)变量。我想创建另一个变量,其大小恰好为100字节,包含此字符串,并填充为零或其他任何内容。我将如何在Python 3中执行此操作?
5 个解决方案
#1
5
For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct
module.
为了组装数据包通过网络,或组装字节完美的二进制文件,我建议使用struct模块。
- struct — Interpret bytes as packed binary data
- struct - 将字节解释为压缩二进制数据
Just for the string, you might not need struct
, but as soon as you start also packing binary values, struct
will make your life much easier.
对于字符串,您可能不需要struct,但只要您开始打包二进制值,struct将使您的生活更轻松。
Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.
根据您的需求,使用现成的网络序列化库(例如Protocol Buffers)可能会更好;或者你甚至可以只使用JSON作为有线格式。
- Protocol Buffer Basics: Python
- 协议缓冲区基础:Python
- PyMOTW - JavaScript Object Notation Serializer
- PyMOTW - JavaScript Object Notation Serializer
#2
4
Something like this should work:
像这样的东西应该工作:
st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation
由于您的原始帖子似乎将字符串与其编码字节表示的长度混淆在一起的强制性附录:Python unicode说明
#3
3
You could use the bytes.zfill
method to add the required number of zeroes:
您可以使用bytes.zfill方法添加所需的零数:
In [19]: result = bytes('おくりびと', 'utf-8').zfill(100)
In [20]: result
Out[20]: b'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000\xe3\x81\x8a\xe3\x81\x8f\xe3\x82\x8a\xe3\x81\xb3\xe3\x81\xa8'
In [21]: len(result)
Out[21]: 100
#4
2
To pad with null bytes you can do it the way they do it in the stdlib base64 module.
要使用空字节填充,您可以像在stdlib base64模块中那样进行填充。
some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))
#5
1
Here's a roundabout way of doing it:
这是一个迂回的方式:
>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24
So following this, an ASCII string of 100 bytes will need to be 79 characters long
因此,在此之后,100字节的ASCII字符串需要长度为79个字符
>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100
This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.
上面这种方法是一种“校准”字符串以确定其长度的相当简单的方法。您可以自动化脚本以将字符串填充到适当的内存大小以考虑其他编码。
def padder(strng):
TARGETSIZE = 100
padChar = "0"
curSize = sys.getsizeof(strng)
if curSize <= TARGETSIZE:
for i in range(TARGETSIZE - curSize):
strng = padChar + strng
return strng
else:
return strng # Not sure if you need to handle strings that start longer than your target, but you can do that here
#1
5
For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct
module.
为了组装数据包通过网络,或组装字节完美的二进制文件,我建议使用struct模块。
- struct — Interpret bytes as packed binary data
- struct - 将字节解释为压缩二进制数据
Just for the string, you might not need struct
, but as soon as you start also packing binary values, struct
will make your life much easier.
对于字符串,您可能不需要struct,但只要您开始打包二进制值,struct将使您的生活更轻松。
Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.
根据您的需求,使用现成的网络序列化库(例如Protocol Buffers)可能会更好;或者你甚至可以只使用JSON作为有线格式。
- Protocol Buffer Basics: Python
- 协议缓冲区基础:Python
- PyMOTW - JavaScript Object Notation Serializer
- PyMOTW - JavaScript Object Notation Serializer
#2
4
Something like this should work:
像这样的东西应该工作:
st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation
由于您的原始帖子似乎将字符串与其编码字节表示的长度混淆在一起的强制性附录:Python unicode说明
#3
3
You could use the bytes.zfill
method to add the required number of zeroes:
您可以使用bytes.zfill方法添加所需的零数:
In [19]: result = bytes('おくりびと', 'utf-8').zfill(100)
In [20]: result
Out[20]: b'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000\xe3\x81\x8a\xe3\x81\x8f\xe3\x82\x8a\xe3\x81\xb3\xe3\x81\xa8'
In [21]: len(result)
Out[21]: 100
#4
2
To pad with null bytes you can do it the way they do it in the stdlib base64 module.
要使用空字节填充,您可以像在stdlib base64模块中那样进行填充。
some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))
#5
1
Here's a roundabout way of doing it:
这是一个迂回的方式:
>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24
So following this, an ASCII string of 100 bytes will need to be 79 characters long
因此,在此之后,100字节的ASCII字符串需要长度为79个字符
>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100
This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.
上面这种方法是一种“校准”字符串以确定其长度的相当简单的方法。您可以自动化脚本以将字符串填充到适当的内存大小以考虑其他编码。
def padder(strng):
TARGETSIZE = 100
padChar = "0"
curSize = sys.getsizeof(strng)
if curSize <= TARGETSIZE:
for i in range(TARGETSIZE - curSize):
strng = padChar + strng
return strng
else:
return strng # Not sure if you need to handle strings that start longer than your target, but you can do that here