I have a fairly large byte array in python. In the simplest situation the byte array only contains 0 or 1 values (0x00, 0x01), also the array is always a multiple of 8 in length. How can I pack these "bits" into another byte array (it doesn't need to be mutable) so the source index zero goes to the MSB of the first output byte etc.
我在python中有一个相当大的字节数组。在最简单的情况下,字节数组只包含0或1个值(0x00,0x01),数组也始终是8的倍数。如何将这些“位”打包到另一个字节数组中(它不需要是可变的),因此源索引0将转到第一个输出字节的MSB等。
For example if src = bytearray([1,0,0,0,1,0,0,1, 1,1,1,0,0,0,1,0, 1,1,1,1,1,1,1,1])
Desired output would be b'\x89\xe2\xff'
.
例如,如果src = bytearray([1,0,0,0,1,0,0,1,1,1,1,0,0,0,1,0,1,1,1,1,1, 1,1,1])期望的输出将是b'\ x89 \ xe2 \ xff'。
I could do it with a for loop and bit shifting and or-ing and concatenation, but there surely is a faster/better built-in way to do this.
我可以通过for循环和位移以及or-ing和concatenation来做到这一点,但肯定有更快/更好的内置方式来做到这一点。
In a follow up question, I also might want to have the source byte array contain values from the set 0-3 and pack these 4 at a time into the output array. Is there a way of doing that?
在后续问题中,我也可能希望源字节数组包含来自集0-3的值,并将这4个一次打包到输出数组中。有没有办法做到这一点?
In general is there a way of interpreting elements of a list as true or false and packing them 8 at a time into a byte array?
通常有一种方法可以将列表的元素解释为true或false,并将它们一次打包成8个字节数组吗?
2 个解决方案
#1
3
As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int
, much as the fastest way to count 1-bits in an int
is bin(n).count('1')
. And it's dead simple, too:
听起来很荒谬,使用内置函数的最快解决方案可能是构建一个字符串并将其传递给int,就像在int中计算1位的最快方法是bin(n).count('1')。它也很简单:
def unbitify_byte(src):
s = ''.join(map(str, src))
n = int(s, 2)
return n.to_bytes(len(src)//8, 'big')
Equivalent (but marginally more complex) code using gmpy2
instead of native Python int
is a bit faster.
使用gmpy2而不是本机Python int的等效(但稍微复杂一点)的代码要快一些。
And you can extend it to 2-bit values pretty easily:
您可以非常轻松地将其扩展为2位值:
def unhalfnybblify_byte(src):
s = ''.join(map(str, src))
n = int(s, 4)
return n.to_bytes(len(src)//4, 'big')
If you want something more flexible, but possibly slower, here's a simple solution using ctypes
.
如果你想要更灵活,但可能更慢的东西,这里有一个使用ctypes的简单解决方案。
If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:
如果您了解C,您可能会看到8个单比特位字段的结构将在这里派上用场。你可以在Python中编写等效的struct类型,如下所示:
class Bits(ctypes.Structure):
_fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]
And you can construct one of them from 8 ints that are all 0 or 1:
你可以从8个全部为0或1的整数构建其中一个:
bits = Bits(*src[:8])
And you can convert that to a single int by using an ugly cast or a simple union:
你可以使用丑陋的演员或简单的联合将它转换为单个int:
class UBits(ctypes.Union):
_fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]
i = UBits(Bits(*src[:8])).i
So now it's just a matter of chunking src
into groups of 8 in big-endian order:
所以现在只需要将src分成8个以大端序排列的组:
chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)
And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.
如何将其扩展到每个字节的四个2位字段,或两个4位字段,甚至两个3位字段和一个2位字段应该是非常明显的。
However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.
然而,尽管看起来像低级C代码,但它可能更慢。不过,它可能值得测试,看看它是否足够快,适合您的使用。
A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy
is the obvious answer. You can't get any simpler than this:
自定义C扩展可能会做得更好。并且PyPI上有许多位阵列类型的模块可供试用。但如果你想沿着这条路前进,那么numpy就是明显的答案。你不能比这更简单:
np.packbits(src)
(A bytearray
works just fine as an "array-like".)
(bytearray就像“类似数组”一样工作得很好。)
It's also hard to beat for speed.
速度也很难打败。
For comparison, here's some measurements:
为了比较,这里有一些测量:
- 60ns/byte + 0.3µs:
np.packbits
on anarray
instead of abytearray
- 60ns/byte + 1.9µs:
np.packbits
- 440ns/byte + 3.2µs:
for
and bit-twiddling in PyPy instead of CPython - 570µs/byte + 3.8µs:
int(…, 2).to_bytes(…)
in PyPy instead of CPython - 610ns/byte + 9.1µs:
bitarray
- 800ns/byte + 2.9µs:
gmpy.mpz(…)…
- 1.0µs/byte + 2.8µs:
int(…, 2).to_bytes(…)
- 2.9µs/byte + 0.2µs:
(UBits(Bits(*chunk)) …)
- 16.µs/byte + 0.9µs:
for
and bit-twiddling
60ns / byte +0.3μs:数组上的np.packbits而不是bytearray
60ns / byte +1.9μs:np.packbits
440ns / byte +3.2μs:用于PyPy而不是CPython中的bit-twiddling
570μs/ byte +3.8μs:PyPy中的int(...,2).to_bytes(...)而不是CPython
610ns /字节+9.1μs:bitarray
800ns /字节+2.9μs:gmpy.mpz(...)......
1.0μs/ byte +2.8μs:int(...,2).to_bytes(...)
2.9μs/字节+0.2μs:( UBits(Bits(* chunk))...)
16.μs/ byte +0.9μs:for和bit-twiddling
#2
0
Using numpy, with test code and comments:
使用numpy,带有测试代码和注释:
#!/usr/bin/env python3
import numpy as np
def pack_bits(a):
# big-endian - use '<u8' if you want little-endian
#0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
b = np.copy(a.view('>u8'))
#0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
b |= b >> 7
#0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
b |= b >> 14
#0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
b |= b >> 28
return np.array(b, dtype='u1')
def main():
a = []
for i in range(256):
# build 8-bit lists without numpy, then convert
a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
a = np.array(a)
print(a)
b = pack_bits(a)
print(b)
if __name__ == '__main__':
main()
Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG
does not overlap the ABCDEFGH
).
由于输入之间的位数小于一个字中的字节数,因此存在其他去交错位的类似代码,我们可以避免屏蔽(注意0ABCDEFG不与ABCDEFGH重叠)。
#1
3
As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int
, much as the fastest way to count 1-bits in an int
is bin(n).count('1')
. And it's dead simple, too:
听起来很荒谬,使用内置函数的最快解决方案可能是构建一个字符串并将其传递给int,就像在int中计算1位的最快方法是bin(n).count('1')。它也很简单:
def unbitify_byte(src):
s = ''.join(map(str, src))
n = int(s, 2)
return n.to_bytes(len(src)//8, 'big')
Equivalent (but marginally more complex) code using gmpy2
instead of native Python int
is a bit faster.
使用gmpy2而不是本机Python int的等效(但稍微复杂一点)的代码要快一些。
And you can extend it to 2-bit values pretty easily:
您可以非常轻松地将其扩展为2位值:
def unhalfnybblify_byte(src):
s = ''.join(map(str, src))
n = int(s, 4)
return n.to_bytes(len(src)//4, 'big')
If you want something more flexible, but possibly slower, here's a simple solution using ctypes
.
如果你想要更灵活,但可能更慢的东西,这里有一个使用ctypes的简单解决方案。
If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:
如果您了解C,您可能会看到8个单比特位字段的结构将在这里派上用场。你可以在Python中编写等效的struct类型,如下所示:
class Bits(ctypes.Structure):
_fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]
And you can construct one of them from 8 ints that are all 0 or 1:
你可以从8个全部为0或1的整数构建其中一个:
bits = Bits(*src[:8])
And you can convert that to a single int by using an ugly cast or a simple union:
你可以使用丑陋的演员或简单的联合将它转换为单个int:
class UBits(ctypes.Union):
_fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]
i = UBits(Bits(*src[:8])).i
So now it's just a matter of chunking src
into groups of 8 in big-endian order:
所以现在只需要将src分成8个以大端序排列的组:
chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)
And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.
如何将其扩展到每个字节的四个2位字段,或两个4位字段,甚至两个3位字段和一个2位字段应该是非常明显的。
However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.
然而,尽管看起来像低级C代码,但它可能更慢。不过,它可能值得测试,看看它是否足够快,适合您的使用。
A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy
is the obvious answer. You can't get any simpler than this:
自定义C扩展可能会做得更好。并且PyPI上有许多位阵列类型的模块可供试用。但如果你想沿着这条路前进,那么numpy就是明显的答案。你不能比这更简单:
np.packbits(src)
(A bytearray
works just fine as an "array-like".)
(bytearray就像“类似数组”一样工作得很好。)
It's also hard to beat for speed.
速度也很难打败。
For comparison, here's some measurements:
为了比较,这里有一些测量:
- 60ns/byte + 0.3µs:
np.packbits
on anarray
instead of abytearray
- 60ns/byte + 1.9µs:
np.packbits
- 440ns/byte + 3.2µs:
for
and bit-twiddling in PyPy instead of CPython - 570µs/byte + 3.8µs:
int(…, 2).to_bytes(…)
in PyPy instead of CPython - 610ns/byte + 9.1µs:
bitarray
- 800ns/byte + 2.9µs:
gmpy.mpz(…)…
- 1.0µs/byte + 2.8µs:
int(…, 2).to_bytes(…)
- 2.9µs/byte + 0.2µs:
(UBits(Bits(*chunk)) …)
- 16.µs/byte + 0.9µs:
for
and bit-twiddling
60ns / byte +0.3μs:数组上的np.packbits而不是bytearray
60ns / byte +1.9μs:np.packbits
440ns / byte +3.2μs:用于PyPy而不是CPython中的bit-twiddling
570μs/ byte +3.8μs:PyPy中的int(...,2).to_bytes(...)而不是CPython
610ns /字节+9.1μs:bitarray
800ns /字节+2.9μs:gmpy.mpz(...)......
1.0μs/ byte +2.8μs:int(...,2).to_bytes(...)
2.9μs/字节+0.2μs:( UBits(Bits(* chunk))...)
16.μs/ byte +0.9μs:for和bit-twiddling
#2
0
Using numpy, with test code and comments:
使用numpy,带有测试代码和注释:
#!/usr/bin/env python3
import numpy as np
def pack_bits(a):
# big-endian - use '<u8' if you want little-endian
#0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
b = np.copy(a.view('>u8'))
#0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
b |= b >> 7
#0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
b |= b >> 14
#0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
b |= b >> 28
return np.array(b, dtype='u1')
def main():
a = []
for i in range(256):
# build 8-bit lists without numpy, then convert
a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
a = np.array(a)
print(a)
b = pack_bits(a)
print(b)
if __name__ == '__main__':
main()
Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG
does not overlap the ABCDEFGH
).
由于输入之间的位数小于一个字中的字节数,因此存在其他去交错位的类似代码,我们可以避免屏蔽(注意0ABCDEFG不与ABCDEFGH重叠)。