将可变大小的字节数组转换为整数/长。

时间:2022-09-15 18:26:49

How can I convert a (big endian) variable-sized binary byte array to an (unsigned) integer/long? As an example, '\x11\x34', which represents 4404

如何将一个(大的endian)可变大小的二进制字节数组转换为一个(无符号)整数/长?例如,“\x11\x34”,表示4404。

Right now, I'm using

现在,我使用

def bytes_to_int(bytes):
  return int(bytes.encode('hex'), 16)

Which is small and somewhat readable, but probably not very efficient. Is there a better (more obvious) way?

它很小,而且可读性强,但可能不是很有效。有更好的(更明显的)方法吗?

2 个解决方案

#1


37  

Python doesn't traditionally have much use for "numbers in big-endian C layout" that are too big for C. (If you're dealing with 2-byte, 4-byte, or 8-byte numbers, then struct.unpack is the answer.)

传统上,Python对于“大端C布局中的数字”并没有太多的用处,对于C来说太大了(如果你处理的是2字节、4字节或8字节的数字,那么结构就是)。答案是解压缩)。

But enough people got sick of there not being one obvious way to do this that Python 3.2 added a method int.from_bytes that does exactly what you want:

但是,有足够多的人厌倦了没有一种明显的方法来实现这一点,Python 3.2添加了一个方法int.from_bytes,它可以实现您想要的:

int.from_bytes(b, byteorder='big', signed=False)

Unfortunately, if you're using an older version of Python, you don't have this. So, what options do you have? (Besides the obvious one: update to 3.2, or, better, 3.4…)

不幸的是,如果您使用的是老版本的Python,那么您就没有这个功能。那么,你有什么选择呢?(除了明显的一个:更新到3.2,或者更好,3.4…)


First, there's your code. I think binascii.hexlify is a better way to spell it than .encode('hex'), because "encode" has always seemed a little weird for a method on byte strings (as opposed to Unicode strings), and it's in fact been banished in Python 3. But otherwise, it seems pretty readable and obvious to me. And it should be pretty fast—yes, it has to create an intermediate string, but it's doing all the looping and arithmetic in C (at least in CPython), which is generally an order of magnitude or two faster than in Python. Unless your bytearray is so big that allocating the string will itself be costly, I wouldn't worry about performance here.

首先,你的代码。我认为binascii。hexlify是一种比.encode(“十六进制”)更好的拼写方式,因为“编码”在字节字符串(相对于Unicode字符串)的方法中总是显得有点怪异,实际上它在Python 3中被禁用了。但除此之外,我觉得它可读性很强。它应该是非常快的,是的,它必须创建一个中间字符串,但是它在C中做所有的循环和算术(至少在CPython中),通常是数量级或两个比Python更快的数量级。除非你的bytearray是如此之大,以至于分配字符串本身是昂贵的,我不会担心这里的性能。

Alternatively, you could do it in a loop. But that's going to be more verbose and, at least in CPython, a lot slower.

或者,也可以在循环中执行。但这将会更加冗长,至少在CPython中,速度要慢得多。

You could try to eliminate the explicit loop for an implicit one, but the obvious function to do that is reduce, which is considered un-Pythonic by part of the community—and of course it's going to require calling a function for each byte.

你可以试着消除一个隐式的显式循环,但是明显的函数是reduce,它被部分的社区认为是非python的,当然它需要为每个字节调用一个函数。

You could unroll the loop or reduce by breaking it into chunks of 8 bytes and looping over struct.unpack_from, or by just doing a big struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8) and looping over that, but that makes it a lot less readable and probably not that much faster.

您可以将循环展开,或者将其分解成8个字节的块,并在struct上循环。unpack_from,或者只是做一个大的struct.unpack('Q'*len(b)//8 + ' b '*len(b) %8),并将其循环遍历,但这使得它的可读性大大降低,而且可能没有那么快。

You could use NumPy… but if you're going bigger than either 64 or maybe 128 bits, it's going to end up converting everything to Python objects anyway.

你可以使用NumPy,但是如果你要大于64或者128位,它最终会把所有东西都转换成Python对象。

So, I think your answer is the best option.

所以,我认为你的答案是最好的选择。


Here are some timings comparing it to the most obvious manual conversion:

以下是一些与最明显的手动转换相比的计时:

import binascii
import functools
import numpy as np

def hexint(b):
    return int(binascii.hexlify(b), 16)

def loop1(b):
    def f(x, y): return (x<<8)|y
    return functools.reduce(f, b, 0)

def loop2(b):
    x = 0
    for c in b:
        x <<= 8
        x |= c
    return x

def numpily(b):
    n = np.array(list(b))
    p = 1 << np.arange(len(b)-1, -1, -1, dtype=object)
    return np.sum(n * p)

In [226]: b = bytearray(range(256))

In [227]: %timeit hexint(b)
1000000 loops, best of 3: 1.8 µs per loop

In [228]: %timeit loop1(b)
10000 loops, best of 3: 57.7 µs per loop

In [229]: %timeit loop2(b)
10000 loops, best of 3: 46.4 µs per loop

In [283]: %timeit numpily(b)
10000 loops, best of 3: 88.5 µs per loop

For comparison in Python 3.4:

在Python 3.4中进行比较:

In [17]: %timeit hexint(b)
1000000 loops, best of 3: 1.69 µs per loop

In [17]: %timeit int.from_bytes(b, byteorder='big', signed=False)
1000000 loops, best of 3: 1.42 µs per loop

So, your method is still pretty fast…

所以,你的方法还是相当快的…

#2


2  

Function struct.unpack(...) does what you need.

功能结构。unpack(…)做你需要的。

#1


37  

Python doesn't traditionally have much use for "numbers in big-endian C layout" that are too big for C. (If you're dealing with 2-byte, 4-byte, or 8-byte numbers, then struct.unpack is the answer.)

传统上,Python对于“大端C布局中的数字”并没有太多的用处,对于C来说太大了(如果你处理的是2字节、4字节或8字节的数字,那么结构就是)。答案是解压缩)。

But enough people got sick of there not being one obvious way to do this that Python 3.2 added a method int.from_bytes that does exactly what you want:

但是,有足够多的人厌倦了没有一种明显的方法来实现这一点,Python 3.2添加了一个方法int.from_bytes,它可以实现您想要的:

int.from_bytes(b, byteorder='big', signed=False)

Unfortunately, if you're using an older version of Python, you don't have this. So, what options do you have? (Besides the obvious one: update to 3.2, or, better, 3.4…)

不幸的是,如果您使用的是老版本的Python,那么您就没有这个功能。那么,你有什么选择呢?(除了明显的一个:更新到3.2,或者更好,3.4…)


First, there's your code. I think binascii.hexlify is a better way to spell it than .encode('hex'), because "encode" has always seemed a little weird for a method on byte strings (as opposed to Unicode strings), and it's in fact been banished in Python 3. But otherwise, it seems pretty readable and obvious to me. And it should be pretty fast—yes, it has to create an intermediate string, but it's doing all the looping and arithmetic in C (at least in CPython), which is generally an order of magnitude or two faster than in Python. Unless your bytearray is so big that allocating the string will itself be costly, I wouldn't worry about performance here.

首先,你的代码。我认为binascii。hexlify是一种比.encode(“十六进制”)更好的拼写方式,因为“编码”在字节字符串(相对于Unicode字符串)的方法中总是显得有点怪异,实际上它在Python 3中被禁用了。但除此之外,我觉得它可读性很强。它应该是非常快的,是的,它必须创建一个中间字符串,但是它在C中做所有的循环和算术(至少在CPython中),通常是数量级或两个比Python更快的数量级。除非你的bytearray是如此之大,以至于分配字符串本身是昂贵的,我不会担心这里的性能。

Alternatively, you could do it in a loop. But that's going to be more verbose and, at least in CPython, a lot slower.

或者,也可以在循环中执行。但这将会更加冗长,至少在CPython中,速度要慢得多。

You could try to eliminate the explicit loop for an implicit one, but the obvious function to do that is reduce, which is considered un-Pythonic by part of the community—and of course it's going to require calling a function for each byte.

你可以试着消除一个隐式的显式循环,但是明显的函数是reduce,它被部分的社区认为是非python的,当然它需要为每个字节调用一个函数。

You could unroll the loop or reduce by breaking it into chunks of 8 bytes and looping over struct.unpack_from, or by just doing a big struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8) and looping over that, but that makes it a lot less readable and probably not that much faster.

您可以将循环展开,或者将其分解成8个字节的块,并在struct上循环。unpack_from,或者只是做一个大的struct.unpack('Q'*len(b)//8 + ' b '*len(b) %8),并将其循环遍历,但这使得它的可读性大大降低,而且可能没有那么快。

You could use NumPy… but if you're going bigger than either 64 or maybe 128 bits, it's going to end up converting everything to Python objects anyway.

你可以使用NumPy,但是如果你要大于64或者128位,它最终会把所有东西都转换成Python对象。

So, I think your answer is the best option.

所以,我认为你的答案是最好的选择。


Here are some timings comparing it to the most obvious manual conversion:

以下是一些与最明显的手动转换相比的计时:

import binascii
import functools
import numpy as np

def hexint(b):
    return int(binascii.hexlify(b), 16)

def loop1(b):
    def f(x, y): return (x<<8)|y
    return functools.reduce(f, b, 0)

def loop2(b):
    x = 0
    for c in b:
        x <<= 8
        x |= c
    return x

def numpily(b):
    n = np.array(list(b))
    p = 1 << np.arange(len(b)-1, -1, -1, dtype=object)
    return np.sum(n * p)

In [226]: b = bytearray(range(256))

In [227]: %timeit hexint(b)
1000000 loops, best of 3: 1.8 µs per loop

In [228]: %timeit loop1(b)
10000 loops, best of 3: 57.7 µs per loop

In [229]: %timeit loop2(b)
10000 loops, best of 3: 46.4 µs per loop

In [283]: %timeit numpily(b)
10000 loops, best of 3: 88.5 µs per loop

For comparison in Python 3.4:

在Python 3.4中进行比较:

In [17]: %timeit hexint(b)
1000000 loops, best of 3: 1.69 µs per loop

In [17]: %timeit int.from_bytes(b, byteorder='big', signed=False)
1000000 loops, best of 3: 1.42 µs per loop

So, your method is still pretty fast…

所以,你的方法还是相当快的…

#2


2  

Function struct.unpack(...) does what you need.

功能结构。unpack(…)做你需要的。