在Python中将位转换为字节

时间:2022-03-19 18:16:15

I am trying to convert a bit string into a byte string, in Python 3.x. In each byte, bits are filled from high order to low order. The last byte is filled with zeros if necessary. The bit string is initially stored as a "collection" of booleans or integers (0 or 1), and I want to return a "collection" of integers in the range 0-255. By collection, I mean a list or a similar object, but not a character string: for example, the function below returns a generator.

我试图在Python 3.x中将位字符串转换为字节字符串。在每个字节中,位从高位到低位填充。如有必要,最后一个字节用零填充。位串最初存储为布尔值或整数(0或1)的“集合”,我想返回0-255范围内的整数“集合”。通过集合,我的意思是列表或类似的对象,但不是字符串:例如,下面的函数返回一个生成器。

So far, the fastest I am able to get is the following:

到目前为止,我能够获得的最快速度如下:

def bitsToBytes(a):
    s = i = 0
    for x in a:
        s += s + x
        i += 1
        if i == 8:
            yield s
            s = i = 0
    if i > 0:
        yield s << (8 - i)

I have tried several alternatives: using enumerate, bulding a list instead of a generator, computing s by "(s << 1) | x" instead of the sum, and everything seems to be a bit slower. Since this solution is also one of the shortest and simplest I found, I am rather happy with it.

我尝试了几种方法:使用枚举,建立列表而不是生成器,通过“(s << 1)| x”而不是总和计算s,并且一切似乎都有点慢。由于这个解决方案也是我发现的最简单和最简单的解决方案之一,我对它很满意。

However, I would like to know if there is a faster solution. Especially, is there a library routine the would do the job much faster, preferably in the standard library?

但是,我想知道是否有更快的解决方案。特别是,是否有一个库例程可以更快地完成工作,最好是在标准库中?


Example input/output

输入/输出示例

[] -> []
[1] -> [128]
[1,1] -> [192]
[1,0,0,0,0,0,0,0,1] -> [128,128]

Here I show the examples with lists. Generators would be fine. However, string would not, and then it would be necessary to convert back and foth between list-like data an string.

这里我用列表显示示例。发电机没问题。但是,字符串不会,然后有必要将类似列表的数据之间的字符串转换回来。

3 个解决方案

#1


3  

Step 1: Add in buffer zeros

第1步:添加缓冲区零

Step 2: Reverse bits since your endianness is reversed

第2步:反转位,因为您的字节顺序被反转

Step 3: Concatenate into a single string

第3步:连接成一个字符串

Step 4: Save off 8 bits at a time into an array

步骤4:一次将8位保存到数组中

Step 5: ???

第五步:???

Step 6: Profit

第6步:获利

def bitsToBytes(a):
    a = [0] * (8 - len(a) % 8) + a # adding in extra 0 values to make a multiple of 8 bits
    s = ''.join(str(x) for x in a)[::-1] # reverses and joins all bits
    returnInts = []
    for i in range(0,len(s),8):
        returnInts.append(int(s[i:i+8],2)) # goes 8 bits at a time to save as ints
    return returnInts

#2


2  

The simplest tactics to consume bits in 8-er chunks and ignore exceptions:

消耗8-er块中的位并忽略异常的最简单策略:

def getbytes(bits):
    done = False
    while not done:
        byte = 0
        for _ in range(0, 8):
            try:
                bit = next(bits)
            except StopIteration:
                bit = 0
                done = True
            byte = (byte << 1) | bit
        yield byte

Usage:

用法:

lst = [1,0,0,0,0,0,0,0,1]
for b in getbytes(iter(lst)):
    print b

bytes is a generator and accepts a generator, that is, it works fine with large and potentially infinite streams.

bytes是一个生成器并接受一个生成器,也就是说,它适用于大型和可能无限的流。

#3


2  

Using itertools' grouper()` recipe:

使用itertools的grouper()`recipe:

from functools import reduce
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

bytes = [reduce(lambda byte, bit: byte << 1 | bit, eight_bits)
         for eight_bits in grouper(bits, 8, fillvalue=0)]

Example

[] -> []
[1] -> [128]
[1, 1] -> [192]
[1, 0, 0, 0, 0, 0, 0, 0, 1] -> [128, 128]

If input is a string then a specialized solution might be faster:

如果输入是字符串,那么专用解决方案可能会更快:

>>> bits = '100000001'
>>> padded_bits = bits + '0' * (8 - len(bits) % 8)
>>> padded_bits
'1000000010000000'
>>> list(int(padded_bits, 2).to_bytes(len(padded_bits) // 8, 'big'))
[128, 128]

The last byte is zero if len(bits) % 8 == 0.

如果len(位)%8 == 0,则最后一个字节为零。

#1


3  

Step 1: Add in buffer zeros

第1步:添加缓冲区零

Step 2: Reverse bits since your endianness is reversed

第2步:反转位,因为您的字节顺序被反转

Step 3: Concatenate into a single string

第3步:连接成一个字符串

Step 4: Save off 8 bits at a time into an array

步骤4:一次将8位保存到数组中

Step 5: ???

第五步:???

Step 6: Profit

第6步:获利

def bitsToBytes(a):
    a = [0] * (8 - len(a) % 8) + a # adding in extra 0 values to make a multiple of 8 bits
    s = ''.join(str(x) for x in a)[::-1] # reverses and joins all bits
    returnInts = []
    for i in range(0,len(s),8):
        returnInts.append(int(s[i:i+8],2)) # goes 8 bits at a time to save as ints
    return returnInts

#2


2  

The simplest tactics to consume bits in 8-er chunks and ignore exceptions:

消耗8-er块中的位并忽略异常的最简单策略:

def getbytes(bits):
    done = False
    while not done:
        byte = 0
        for _ in range(0, 8):
            try:
                bit = next(bits)
            except StopIteration:
                bit = 0
                done = True
            byte = (byte << 1) | bit
        yield byte

Usage:

用法:

lst = [1,0,0,0,0,0,0,0,1]
for b in getbytes(iter(lst)):
    print b

bytes is a generator and accepts a generator, that is, it works fine with large and potentially infinite streams.

bytes是一个生成器并接受一个生成器,也就是说,它适用于大型和可能无限的流。

#3


2  

Using itertools' grouper()` recipe:

使用itertools的grouper()`recipe:

from functools import reduce
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

bytes = [reduce(lambda byte, bit: byte << 1 | bit, eight_bits)
         for eight_bits in grouper(bits, 8, fillvalue=0)]

Example

[] -> []
[1] -> [128]
[1, 1] -> [192]
[1, 0, 0, 0, 0, 0, 0, 0, 1] -> [128, 128]

If input is a string then a specialized solution might be faster:

如果输入是字符串,那么专用解决方案可能会更快:

>>> bits = '100000001'
>>> padded_bits = bits + '0' * (8 - len(bits) % 8)
>>> padded_bits
'1000000010000000'
>>> list(int(padded_bits, 2).to_bytes(len(padded_bits) // 8, 'big'))
[128, 128]

The last byte is zero if len(bits) % 8 == 0.

如果len(位)%8 == 0,则最后一个字节为零。