将字节列表拆分为dicts列表

时间:2022-05-09 20:44:10

I have some byte data (say for an image):

我有一些字节数据(例如图像):

00 19 01 21 09 0f 01 15 .. FF

00 19 01 21 09 0f 01 15 .. FF

I parse it and store it as a byte list:

我解析它并将其存储为字节列表:

[b'\x00', b'\x19', b'\x01', b'\x21', b'\x09', b'\x0f', b'\x01', b'\x15', ...]

[b'\ x00',b'\ x19',b'\ x01',b'\ x21',b'\ x09',b'\ x0f',b'\ x01',b'\ x15',. ..]

These are RGBA values (little endian, 2 bytes) that I need to parse as dict format as follows:

这些是RGBA值(小端,2个字节),我需要解析为dict格式,如下所示:

[{'red':0x0019, 'green':0x2101, 'blue':0x0f09, 'alpha':0x1501}, {'red':...},...]

[{'red':0x0019,'green':0x2101,'blue':0x0f09,'alpha':0x1501},{'red':...},...]

Note: The image data terminates once we reach a 0xff. Values can be stored in hex or decimal, doesn't matter as long as it's consistent.

注意:一旦达到0xff,图像数据就会终止。值可以以十六进制或十进制存储,只要它一致就无关紧要。

My attempt:

# our dict keys
keys = ['red', 'green', 'blue', 'alpha']

# first, grab all bytes until we hit 0xff
img = list(takewhile(lambda x: x != b'\xFF', bitstream))

# traverse img 2 bytes at a time and join them
rgba = []
for i,j in zip(img[0::2],img[1::2]):
  rgba.append(b''.join([j,i]) # j first since byteorder is 'little'

So far it will output [0x0019, 0x2101, 0x0f09, ...]

到目前为止它将输出[0x0019,0x2101,0x0f09,...]

Now I'm stuck on how to create the list of dicts "pythonically". I can simply use a for loop and pop 4 items from the list at a time but that's not really using Python's features to their potential. Any advice?

现在我停留在如何创建“pythonically”的词典列表。我可以简单地使用for循环并一次从列表中弹出4个项目,但这并不是真正使用Python的功能。有什么建议?

Note: this is just an example, my keys can be anything (not related to images). Also overlook any issues with len(img) % len(keys) != 0.

注意:这只是一个例子,我的键可以是任何东西(与图像无关)。也忽略了len(img)%len(键)的任何问题!= 0。

3 个解决方案

#1


First, use StringIO to create a file-like object from the bitstream to facilitate grabbing 8-byte chunks one at a time. Then, use struct.unpack to convert each 8-byte chunk into a tuple of 4 integers, which we zip with the tuple of keys to create a list that can be passed directly to dict. All this is wrapped in a list comprehension to create rgba in one pass.

首先,使用StringIO从比特流创建类似文件的对象,以便一次抓取8个字节的块。然后,使用struct.unpack将每个8字节的块转换为4个整数的元组,我们使用键的元组进行压缩以创建可以直接传递给dict的列表。所有这些都包含在列表理解中,以便在一次传递中创建rgba。

(I also use functools.partial and itertools.imap to improve readabililty.)

(我也使用functools.partial和itertools.imap来提高可读性。)

import StringIO
import re
from itertools import imap
from functools import partial

keys = ("red", "green", "blue", "alpha")
# Create an object we can read from
str_iter = StringIO.StringIO(re.sub("\xff.*", "", bitstream))
# A callable which reads 8 bytes at a time from str_iter
read_8_bytes = partial(str_iter.read, 8)
# Convert an 8-byte string into a tuple of 4 integer values
unpack_rgba = partial(struct.unpack, "<HHHH")
# An iterable of 8-byte strings
chunk_iter = iter(read_8_bytes, '')
# Map unpack_rgba over the iterator to get an iterator of 4-tuples,
# then zip each 4-tuple with the key tuple to create the desired dict
rgba = [dict(zip(keys, rgba_values))
         for rgba_values in imap(unpack_rgba, chunk_iter)]

(If you getting the binary data with something like

(如果你得到类似的二进制数据

with open('somefile', 'rb') as fh:
    bitstream = fh.read()

then you can use the file iterator in place of str_iter, so that you only read bytes from the file as you need them, rather than all at once.)

然后你可以使用文件迭代器代替str_iter,这样你只需要从文件中读取字节,而不是一次读取。)

#2


Maybe instead of

也许不是

rgba = []
for i,j in zip(img[0::2],img[1::2]):
  rgba.append(b''.join([j,i]) # j first since byteorder is 'little'

You can simplify it to

你可以简化它

rgba = [b''.join([j,i]) for i,j in zip(img[0::2], img[1::2])]

Now you need to chunkify your list, so you can maybe borrow a recipe from this link, then get:

现在你需要对列表进行分块,这样你就可以从这个链接借用一个食谱,然后得到:

dict_list = [dict(zip(keys, chunk)) for chunk in chunks(rgba, 4)]

e.g.

>>> keys = ['red', 'green', 'blue', 'alpha']
>>> test  = [b'\x0019', b'\x2101', b'\x0f09', b'\x1501']
>>> dict(zip(keys, test))
{'blue': '\x0f09', 'alpha': '\x1501', 'green': '!01', 'red': '\x0019'}

#3


Without getting too fancy, you could do it very efficiently like this:

没有过于花哨,你可以像这样非常有效地做到:

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

img  = [b'\x00', b'\x19', b'\x01', b'\x21', b'\x09', b'\x0f', b'\x01', b'\x15',
        b'\x01', b'\x1a', b'\x02', b'\x22', b'\x0a', b'\x10', b'\x02', b'\x16',
        b'\xff']

keys = ['red', 'green', 'blue', 'alpha']
list_of_dicts = [dict(izip(keys, group))
                    for group in grouper(4, (j+i for i,j in grouper(2, img)))]

for value in list_of_dicts:
    print(value)

#1


First, use StringIO to create a file-like object from the bitstream to facilitate grabbing 8-byte chunks one at a time. Then, use struct.unpack to convert each 8-byte chunk into a tuple of 4 integers, which we zip with the tuple of keys to create a list that can be passed directly to dict. All this is wrapped in a list comprehension to create rgba in one pass.

首先,使用StringIO从比特流创建类似文件的对象,以便一次抓取8个字节的块。然后,使用struct.unpack将每个8字节的块转换为4个整数的元组,我们使用键的元组进行压缩以创建可以直接传递给dict的列表。所有这些都包含在列表理解中,以便在一次传递中创建rgba。

(I also use functools.partial and itertools.imap to improve readabililty.)

(我也使用functools.partial和itertools.imap来提高可读性。)

import StringIO
import re
from itertools import imap
from functools import partial

keys = ("red", "green", "blue", "alpha")
# Create an object we can read from
str_iter = StringIO.StringIO(re.sub("\xff.*", "", bitstream))
# A callable which reads 8 bytes at a time from str_iter
read_8_bytes = partial(str_iter.read, 8)
# Convert an 8-byte string into a tuple of 4 integer values
unpack_rgba = partial(struct.unpack, "<HHHH")
# An iterable of 8-byte strings
chunk_iter = iter(read_8_bytes, '')
# Map unpack_rgba over the iterator to get an iterator of 4-tuples,
# then zip each 4-tuple with the key tuple to create the desired dict
rgba = [dict(zip(keys, rgba_values))
         for rgba_values in imap(unpack_rgba, chunk_iter)]

(If you getting the binary data with something like

(如果你得到类似的二进制数据

with open('somefile', 'rb') as fh:
    bitstream = fh.read()

then you can use the file iterator in place of str_iter, so that you only read bytes from the file as you need them, rather than all at once.)

然后你可以使用文件迭代器代替str_iter,这样你只需要从文件中读取字节,而不是一次读取。)

#2


Maybe instead of

也许不是

rgba = []
for i,j in zip(img[0::2],img[1::2]):
  rgba.append(b''.join([j,i]) # j first since byteorder is 'little'

You can simplify it to

你可以简化它

rgba = [b''.join([j,i]) for i,j in zip(img[0::2], img[1::2])]

Now you need to chunkify your list, so you can maybe borrow a recipe from this link, then get:

现在你需要对列表进行分块,这样你就可以从这个链接借用一个食谱,然后得到:

dict_list = [dict(zip(keys, chunk)) for chunk in chunks(rgba, 4)]

e.g.

>>> keys = ['red', 'green', 'blue', 'alpha']
>>> test  = [b'\x0019', b'\x2101', b'\x0f09', b'\x1501']
>>> dict(zip(keys, test))
{'blue': '\x0f09', 'alpha': '\x1501', 'green': '!01', 'red': '\x0019'}

#3


Without getting too fancy, you could do it very efficiently like this:

没有过于花哨,你可以像这样非常有效地做到:

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

img  = [b'\x00', b'\x19', b'\x01', b'\x21', b'\x09', b'\x0f', b'\x01', b'\x15',
        b'\x01', b'\x1a', b'\x02', b'\x22', b'\x0a', b'\x10', b'\x02', b'\x16',
        b'\xff']

keys = ['red', 'green', 'blue', 'alpha']
list_of_dicts = [dict(izip(keys, group))
                    for group in grouper(4, (j+i for i,j in grouper(2, img)))]

for value in list_of_dicts:
    print(value)