如何在python3中有效地将位从一个字节数组打包到另一个字节数组?

时间:2022-02-08 07:12:28

I have a fairly large byte array in python. In the simplest situation the byte array only contains 0 or 1 values (0x00, 0x01), also the array is always a multiple of 8 in length. How can I pack these "bits" into another byte array (it doesn't need to be mutable) so the source index zero goes to the MSB of the first output byte etc.

我在python中有一个相当大的字节数组。在最简单的情况下,字节数组只包含0或1个值(0x00,0x01),数组也始终是8的倍数。如何将这些“位”打包到另一个字节数组中(它不需要是可变的),因此源索引0将转到第一个输出字节的MSB等。

For example if src = bytearray([1,0,0,0,1,0,0,1, 1,1,1,0,0,0,1,0, 1,1,1,1,1,1,1,1]) Desired output would be b'\x89\xe2\xff'.

例如,如果src = bytearray([1,0,0,0,1,0,0,1,1,1,1,0,0,0,1,0,1,1,1,1,1, 1,1,1])期望的输出将是b'\ x89 \ xe2 \ xff'。

I could do it with a for loop and bit shifting and or-ing and concatenation, but there surely is a faster/better built-in way to do this.

我可以通过for循环和位移以及or-ing和concatenation来做到这一点,但肯定有更快/更好的内置方式来做到这一点。

In a follow up question, I also might want to have the source byte array contain values from the set 0-3 and pack these 4 at a time into the output array. Is there a way of doing that?

在后续问题中,我也可能希望源字节数组包含来自集0-3的值,并将这4个一次打包到输出数组中。有没有办法做到这一点?

In general is there a way of interpreting elements of a list as true or false and packing them 8 at a time into a byte array?

通常有一种方法可以将列表的元素解释为true或false,并将它们一次打包成8个字节数组吗?

2 个解决方案

#1


3  

As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int, much as the fastest way to count 1-bits in an int is bin(n).count('1'). And it's dead simple, too:

听起来很荒谬,使用内置函数的最快解决方案可能是构建一个字符串并将其传递给int,就像在int中计算1位的最快方法是bin(n).count('1')。它也很简单:

def unbitify_byte(src):
    s = ''.join(map(str, src))
    n = int(s, 2)
    return n.to_bytes(len(src)//8, 'big')

Equivalent (but marginally more complex) code using gmpy2 instead of native Python int is a bit faster.

使用gmpy2而不是本机Python int的等效(但稍微复杂一点)的代码要快一些。

And you can extend it to 2-bit values pretty easily:

您可以非常轻松地将其扩展为2位值:

def unhalfnybblify_byte(src):
    s = ''.join(map(str, src))
    n = int(s, 4)
    return n.to_bytes(len(src)//4, 'big')

If you want something more flexible, but possibly slower, here's a simple solution using ctypes.

如果你想要更灵活,但可能更慢的东西,这里有一个使用ctypes的简单解决方案。

If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:

如果您了解C,您可能会看到8个单比特位字段的结构将在这里派上用场。你可以在Python中编写等效的struct类型,如下所示:

class Bits(ctypes.Structure):
    _fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]

And you can construct one of them from 8 ints that are all 0 or 1:

你可以从8个全部为0或1的整数构建其中一个:

bits = Bits(*src[:8])

And you can convert that to a single int by using an ugly cast or a simple union:

你可以使用丑陋的演员或简单的联合将它转换为单个int:

class UBits(ctypes.Union):
    _fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]

i = UBits(Bits(*src[:8])).i

So now it's just a matter of chunking src into groups of 8 in big-endian order:

所以现在只需要将src分成8个以大端序排列的组:

chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)

And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.

如何将其扩展到每个字节的四个2位字段,或两个4位字段,甚至两个3位字段和一个2位字段应该是非常明显的。

However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.

然而,尽管看起来像低级C代码,但它可能更慢。不过,它可能值得测试,看看它是否足够快,适合您的使用。


A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy is the obvious answer. You can't get any simpler than this:

自定义C扩展可能会做得更好。并且PyPI上有许多位阵列类型的模块可供试用。但如果你想沿着这条路前进,那么numpy就是明显的答案。你不能比这更简单:

np.packbits(src)

(A bytearray works just fine as an "array-like".)

(bytearray就像“类似数组”一样工作得很好。)

It's also hard to beat for speed.

速度也很难打败。


For comparison, here's some measurements:

为了比较,这里有一些测量:

  • 60ns/byte + 0.3µs: np.packbits on an array instead of a bytearray
  • 60ns / byte +0.3μs:数组上的np.packbits而不是bytearray

  • 60ns/byte + 1.9µs: np.packbits
  • 60ns / byte +1.9μs:np.packbits

  • 440ns/byte + 3.2µs: for and bit-twiddling in PyPy instead of CPython
  • 440ns / byte +3.2μs:用于PyPy而不是CPython中的bit-twiddling

  • 570µs/byte + 3.8µs: int(…, 2).to_bytes(…) in PyPy instead of CPython
  • 570μs/ byte +3.8μs:PyPy中的int(...,2).to_bytes(...)而不是CPython

  • 610ns/byte + 9.1µs: bitarray
  • 610ns /字节+9.1μs:bitarray

  • 800ns/byte + 2.9µs: gmpy.mpz(…)…
  • 800ns /字节+2.9μs:gmpy.mpz(...)......

  • 1.0µs/byte + 2.8µs: int(…, 2).to_bytes(…)
  • 1.0μs/ byte +2.8μs:int(...,2).to_bytes(...)

  • 2.9µs/byte + 0.2µs: (UBits(Bits(*chunk)) …)
  • 2.9μs/字节+0.2μs:( UBits(Bits(* chunk))...)

  • 16.µs/byte + 0.9µs: for and bit-twiddling
  • 16.μs/ byte +0.9μs:for和bit-twiddling

#2


0  

Using numpy, with test code and comments:

使用numpy,带有测试代码和注释:

#!/usr/bin/env python3
import numpy as np


def pack_bits(a):
    # big-endian - use '<u8' if you want little-endian
    #0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
    b = np.copy(a.view('>u8'))
    #0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
    b |= b >> 7
    #0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
    b |= b >> 14
    #0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
    b |= b >> 28
    return np.array(b, dtype='u1')

def main():
    a = []
    for i in range(256):
        # build 8-bit lists without numpy, then convert
        a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
    a = np.array(a)
    print(a)
    b = pack_bits(a)
    print(b)

if __name__ == '__main__':
    main()

Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG does not overlap the ABCDEFGH).

由于输入之间的位数小于一个字中的字节数,因此存在其他去交错位的类似代码,我们可以避免屏蔽(注意0ABCDEFG不与ABCDEFGH重叠)。

#1


3  

As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int, much as the fastest way to count 1-bits in an int is bin(n).count('1'). And it's dead simple, too:

听起来很荒谬,使用内置函数的最快解决方案可能是构建一个字符串并将其传递给int,就像在int中计算1位的最快方法是bin(n).count('1')。它也很简单:

def unbitify_byte(src):
    s = ''.join(map(str, src))
    n = int(s, 2)
    return n.to_bytes(len(src)//8, 'big')

Equivalent (but marginally more complex) code using gmpy2 instead of native Python int is a bit faster.

使用gmpy2而不是本机Python int的等效(但稍微复杂一点)的代码要快一些。

And you can extend it to 2-bit values pretty easily:

您可以非常轻松地将其扩展为2位值:

def unhalfnybblify_byte(src):
    s = ''.join(map(str, src))
    n = int(s, 4)
    return n.to_bytes(len(src)//4, 'big')

If you want something more flexible, but possibly slower, here's a simple solution using ctypes.

如果你想要更灵活,但可能更慢的东西,这里有一个使用ctypes的简单解决方案。

If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:

如果您了解C,您可能会看到8个单比特位字段的结构将在这里派上用场。你可以在Python中编写等效的struct类型,如下所示:

class Bits(ctypes.Structure):
    _fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]

And you can construct one of them from 8 ints that are all 0 or 1:

你可以从8个全部为0或1的整数构建其中一个:

bits = Bits(*src[:8])

And you can convert that to a single int by using an ugly cast or a simple union:

你可以使用丑陋的演员或简单的联合将它转换为单个int:

class UBits(ctypes.Union):
    _fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]

i = UBits(Bits(*src[:8])).i

So now it's just a matter of chunking src into groups of 8 in big-endian order:

所以现在只需要将src分成8个以大端序排列的组:

chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)

And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.

如何将其扩展到每个字节的四个2位字段,或两个4位字段,甚至两个3位字段和一个2位字段应该是非常明显的。

However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.

然而,尽管看起来像低级C代码,但它可能更慢。不过,它可能值得测试,看看它是否足够快,适合您的使用。


A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy is the obvious answer. You can't get any simpler than this:

自定义C扩展可能会做得更好。并且PyPI上有许多位阵列类型的模块可供试用。但如果你想沿着这条路前进,那么numpy就是明显的答案。你不能比这更简单:

np.packbits(src)

(A bytearray works just fine as an "array-like".)

(bytearray就像“类似数组”一样工作得很好。)

It's also hard to beat for speed.

速度也很难打败。


For comparison, here's some measurements:

为了比较,这里有一些测量:

  • 60ns/byte + 0.3µs: np.packbits on an array instead of a bytearray
  • 60ns / byte +0.3μs:数组上的np.packbits而不是bytearray

  • 60ns/byte + 1.9µs: np.packbits
  • 60ns / byte +1.9μs:np.packbits

  • 440ns/byte + 3.2µs: for and bit-twiddling in PyPy instead of CPython
  • 440ns / byte +3.2μs:用于PyPy而不是CPython中的bit-twiddling

  • 570µs/byte + 3.8µs: int(…, 2).to_bytes(…) in PyPy instead of CPython
  • 570μs/ byte +3.8μs:PyPy中的int(...,2).to_bytes(...)而不是CPython

  • 610ns/byte + 9.1µs: bitarray
  • 610ns /字节+9.1μs:bitarray

  • 800ns/byte + 2.9µs: gmpy.mpz(…)…
  • 800ns /字节+2.9μs:gmpy.mpz(...)......

  • 1.0µs/byte + 2.8µs: int(…, 2).to_bytes(…)
  • 1.0μs/ byte +2.8μs:int(...,2).to_bytes(...)

  • 2.9µs/byte + 0.2µs: (UBits(Bits(*chunk)) …)
  • 2.9μs/字节+0.2μs:( UBits(Bits(* chunk))...)

  • 16.µs/byte + 0.9µs: for and bit-twiddling
  • 16.μs/ byte +0.9μs:for和bit-twiddling

#2


0  

Using numpy, with test code and comments:

使用numpy,带有测试代码和注释:

#!/usr/bin/env python3
import numpy as np


def pack_bits(a):
    # big-endian - use '<u8' if you want little-endian
    #0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
    b = np.copy(a.view('>u8'))
    #0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
    b |= b >> 7
    #0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
    b |= b >> 14
    #0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
    b |= b >> 28
    return np.array(b, dtype='u1')

def main():
    a = []
    for i in range(256):
        # build 8-bit lists without numpy, then convert
        a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
    a = np.array(a)
    print(a)
    b = pack_bits(a)
    print(b)

if __name__ == '__main__':
    main()

Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG does not overlap the ABCDEFGH).

由于输入之间的位数小于一个字中的字节数,因此存在其他去交错位的类似代码,我们可以避免屏蔽(注意0ABCDEFG不与ABCDEFGH重叠)。