使用Python将二进制字符串转换为整数列表

时间:2021-01-09 18:26:02

I am new to Python. Here's what I am trying to do:

我是Python的新手。这是我想要做的:

  1. Slice a long binary string into 3 digit-long chunks.
  2. 将长二进制字符串切成3个数字长的块。

  3. Store each "chunk" into a list called row.
  4. 将每个“块”存储到名为row的列表中。

  5. Convert each binary chunk into a number (0-7).
  6. 将每个二进制块转换为数字(0-7)。

  7. Store the converted list of numbers into a new list called numbers.
  8. 将转换后的数字列表存储到名为numbers的新列表中。

Here is what I have so far:

这是我到目前为止:

def traverse(R):
        x = 0
        while x < (len(R) - 3):
            row = R[x] + R[x+1] + R[x+2]
            ???

Thanks for your help! It is greatly appreciated.

谢谢你的帮助!非常感谢。

5 个解决方案

#1


11  

Something like this should do it:

这样的事情应该这样做:

s = "110101001"
numbers = [int(s[i:i+3], 2) for i in range(0, len(s), 3)]
print numbers

The output is:

输出是:

[6, 5, 1]

Breaking this down step by step, first:

首先逐步打破这个问题:

>>> range(0, len(s), 3)
[0, 3, 6]

The range() function produces a list of integers from 0, less than the max len(s), by step 3.

range()函数在步骤3中生成一个从0开始的整数列表,小于最大len(s)。

>>> [s[i:i+3] for i in range(0, len(s), 3)]
["110", "101", "001"]

This is a list comprehension that evaluates s[i:i+3] for each i in the above range. The s[i:i+3] is a slice that selects a substring. Finally:

这是一个列表推导,它评估上述范围内每个i的s [i:i + 3]。 s [i:i + 3]是一个选择子字符串的切片。最后:

>>> [int(s[i:i+3], 2) for i in range(0, len(s), 3)]
[6, 5, 1]

The int(..., 2) function converts from binary (base 2, second argument) to integers.

int(...,2)函数从二进制(基数2,第二个参数)转换为整数。

Note that the above code may not properly handle error conditions like an input string that is not a multiple of 3 characters in length.

请注意,上面的代码可能无法正确处理错误条件,例如输入字符串的长度不是3个字符的倍数。

#2


7  

I'll assume that by "binary string" you actually mean a normal string (i.e. text) whose items are all '0' or '1'.

我假设通过“二进制字符串”,你实际上是指一个正常的字符串(即文本),其项目都是'0'或'1'。

So for points 1 and 2,

所以对于第1点和第2点,

row = [thestring[i:i+3] for i in xrange(0, len(thestring), 3)]

of course the last item will be only 1 or 2 characters long if len(thestring) is not an exact multiple of 3, that's inevitable;-).

当然,如果len(thestring)不是3的精确倍数,那么最后一项将只有1或2个字符,这是不可避免的;-)。

For points 3 and 4, I'd suggest building an auxiliary temp dictionary and storing it:

对于第3点和第4点,我建议构建一个辅助临时字典并存储它:

aux = {}
for x in range(8):
  s = format(x, 'b')
  aux[s] = x
  aux[('00'+s)[-3:]] = x

so that points 3 and 4 just become:

所以第3点和第4点变为:

numbers = [aux[x] for x in row]

this dict lookup should be much faster than converting each entry on the fly.

这个字典查找应该比在运行中转换每个条目快得多。

Edit: it's been suggested I explain why am I making two entries into aux for each value of x. The point is that s may be of any length from 1 to 3 characters, and for the short lengths I do want two entries -- one with s as it it (because as I mentioned the last item in row may well be shorter than 3...), and one with it left-padded to a length of 3 with 0s.

编辑:有人建议我解释为什么我为x的每个值两个条目进入aux。关键是s可以是1到3个字符的任意长度,对于短的长度,我确实需要两个条目 - 一个用s作为它(因为我提到行中的最后一项可能短于3 ...),其中一个用左边填充到3的长度为0。

The sub-expression ('00'+s)[-3:] computes "s left-padded with '0's to a length of 3" by taking the last 3 characters (that's the [-3:] slicing part) of the string obtained by placing zeros to the left of s (that's the '00'+s part). If s is already 3 characters long, the whole subexpression will equal s so the assignment to that entry of aux is useless but harmless, so I find it simpler to not even bother checking (prepending an if len(s)<3: would be fine too, matter of taste;-).

子表达式('00'+ s)[ - 3:]通过取最后3个字符(即[-3:]切片部分)计算“左边填充'0'到3的长度”通过在s的左边放置零获得的字符串(即'00'+ s部分)。如果s已经是3个字符长,那么整个子表达式将等于s,因此对aux的那个条目的赋值是无用的但是无害的,所以我发现更简单甚至不打扰检查(在if之前加上一个len(s)<3:将是味道也很好;-)。

There are other approaches (e.g. formatting x again if needed) but this is hardly the crux of the code (it executes just 8 times to build up the auxiliary "lookup table", after all;-), so I didn't pay it enough attention.

还有其他方法(例如,如果需要再次格式化x),但这不是代码的关键(它只执行8次构建辅助“查找表”,毕竟;-),所以我没有支付它足够的关注。

...nor did I unit-test it, so it has a bug in one obscure corner case. Can you see it...?

...我也没有对它进行单元测试,所以它在一个不起眼的角落案例中有一个错误。你能看见它吗...?

Suppose row has '01' as the last entry: THAT key, after my code's above has built aux, will not be present in aux (both 1 and 001 WILL be, but that's scanty consolation;-). In the code above I use the original s, '1', and the length-three padded version, '001', but the intermediate length-two padded version, oops, got overlooked;-).

假设行有'01'作为最后一个条目:在我的代码上面已经构建了aux之后,那个密钥将不会出现在aux中(1和001都会出现,但这是很少的安慰;-)。在上面的代码中,我使用原始的s,'1'和长度为3的填充版本'001',但是中间长度 - 两个填充版本,oops,被忽略了;-)。

So, here's a RIGHT way to do it...:

所以,这是一个正确的方法......:

aux = {}
for x in range(8):
  s = format(x, 'b')
  aux[s] = x
  while len(s) < 3:
    s = '0' + s
    aux[s] = x

...no doubt simpler and more obvious, but, even more importantly, CORRECT;-).

......无疑更简单,更明显,但更重要的是,正确;-)。

#3


1  

If you're dealing with processing raw data of any kind, I'd like to recommend the excellent bitstring module:

如果您正在处理任何类型的原始数据,我想推荐优秀的bitstring模块:

>>> import bitstring
>>> bits = bitstring.Bits('0b110101001')
>>> [b.uint for b in bits.cut(3)]
[6, 5, 1]

Description from the home page:

主页的描述:

A Python module that makes the creation, manipulation and analysis of binary data as simple and natural as possible.

一个Python模块,它使二进制数据的创建,操作和分析尽可能简单和自然。

Bitstrings can be constructed from integers, floats, hex, octal, binary, bytes or files. They can also be created and interpreted using flexible format strings.

可以用整数,浮点数,十六进制,八进制,二进制,字节或文件构造Bitstrings。它们也可以使用灵活的格式字符串创建和解释。

Bitstrings can be sliced, joined, reversed, inserted into, overwritten, etc. with simple methods or using slice notation. They can also be read from, searched and replaced, and navigated in, similar to a file or stream.

可以使用简单的方法或使用切片表示法对Bitstrings进行切片,连接,反转,插入,覆盖等。它们也可以被读取,搜索和替换,以及导航,类似于文件或流。

Internally the bit data is efficiently stored in byte arrays, the module has been optimized for speed, and excellent code coverage is given by over 400 unit tests.

在内部,位数据有效地存储在字节数组中,模块已针对速度进行了优化,并且通过400多个单元测试提供了出色的代码覆盖率。

#4


0  

Great answers from Greg and Alex! List comprehensions and slicing are so pythonic! For short input strings I wouldn't bother with the dictionary lookup trick, but if the input string were longer, I would, as well as using gen-exps rather than list-comps, i.e.:

Greg和Alex的精彩回答!列表理解和切片是如此pythonic!对于短输入字符串,我不打扰字典查找技巧,但如果输入字符串更长,我会,以及使用gen-exps而不是list-comps,即:

row = list(thestring[i:i+3] for i in xrange(0, len(thestring), 3))

row = list(thestring [i:i + 3] for x in xrange(0,len(thestring),3))

and

numbers = list(aux[x] for x in row)

numbers = list(行中x的aux [x])

since gen-exp perform better.

因为gen-exp表现更好。

#5


0  

Wouldn't this be easier:

这不会更容易:

(I wanted an array of the upper 3 bits of a variable that contained the integer 29)

(我想要一个包含整数29的变量的高3位数组)

format your variables and arrays first

首先格式化变量和数组

a = ''

a =''

b = []

b = []

I stole this from a really good example in this forum, it formats the integer 29 into 5 bits, bits zero through four and puts the string of bits into the string variable "a". [edited] Needed to change the format from 0:5b to 0:05b, in order to pad zeros when the integer is < 7.

我从这个论坛中的一个非常好的例子中偷了这个,它将整数29格式化为5位,位0到4,并将位串放入字符串变量“a”。 [编辑]需要将格式从0:5b更改为0:05b,以便在整数<7时填充零。

a = '{0:05b}'.format(29)

a ='{0:05b}'。格式(29)

look at your string variable

看看你的字符串变量

a

'11101'

split your string into an array

将您的字符串拆分为数组

b[0:3] = a[0:3]

b [0:3] = a [0:3]

this is exactly what I wanted.

这正是我想要的。

b

['1', '1', '1']

['1','1','1']

#1


11  

Something like this should do it:

这样的事情应该这样做:

s = "110101001"
numbers = [int(s[i:i+3], 2) for i in range(0, len(s), 3)]
print numbers

The output is:

输出是:

[6, 5, 1]

Breaking this down step by step, first:

首先逐步打破这个问题:

>>> range(0, len(s), 3)
[0, 3, 6]

The range() function produces a list of integers from 0, less than the max len(s), by step 3.

range()函数在步骤3中生成一个从0开始的整数列表,小于最大len(s)。

>>> [s[i:i+3] for i in range(0, len(s), 3)]
["110", "101", "001"]

This is a list comprehension that evaluates s[i:i+3] for each i in the above range. The s[i:i+3] is a slice that selects a substring. Finally:

这是一个列表推导,它评估上述范围内每个i的s [i:i + 3]。 s [i:i + 3]是一个选择子字符串的切片。最后:

>>> [int(s[i:i+3], 2) for i in range(0, len(s), 3)]
[6, 5, 1]

The int(..., 2) function converts from binary (base 2, second argument) to integers.

int(...,2)函数从二进制(基数2,第二个参数)转换为整数。

Note that the above code may not properly handle error conditions like an input string that is not a multiple of 3 characters in length.

请注意,上面的代码可能无法正确处理错误条件,例如输入字符串的长度不是3个字符的倍数。

#2


7  

I'll assume that by "binary string" you actually mean a normal string (i.e. text) whose items are all '0' or '1'.

我假设通过“二进制字符串”,你实际上是指一个正常的字符串(即文本),其项目都是'0'或'1'。

So for points 1 and 2,

所以对于第1点和第2点,

row = [thestring[i:i+3] for i in xrange(0, len(thestring), 3)]

of course the last item will be only 1 or 2 characters long if len(thestring) is not an exact multiple of 3, that's inevitable;-).

当然,如果len(thestring)不是3的精确倍数,那么最后一项将只有1或2个字符,这是不可避免的;-)。

For points 3 and 4, I'd suggest building an auxiliary temp dictionary and storing it:

对于第3点和第4点,我建议构建一个辅助临时字典并存储它:

aux = {}
for x in range(8):
  s = format(x, 'b')
  aux[s] = x
  aux[('00'+s)[-3:]] = x

so that points 3 and 4 just become:

所以第3点和第4点变为:

numbers = [aux[x] for x in row]

this dict lookup should be much faster than converting each entry on the fly.

这个字典查找应该比在运行中转换每个条目快得多。

Edit: it's been suggested I explain why am I making two entries into aux for each value of x. The point is that s may be of any length from 1 to 3 characters, and for the short lengths I do want two entries -- one with s as it it (because as I mentioned the last item in row may well be shorter than 3...), and one with it left-padded to a length of 3 with 0s.

编辑:有人建议我解释为什么我为x的每个值两个条目进入aux。关键是s可以是1到3个字符的任意长度,对于短的长度,我确实需要两个条目 - 一个用s作为它(因为我提到行中的最后一项可能短于3 ...),其中一个用左边填充到3的长度为0。

The sub-expression ('00'+s)[-3:] computes "s left-padded with '0's to a length of 3" by taking the last 3 characters (that's the [-3:] slicing part) of the string obtained by placing zeros to the left of s (that's the '00'+s part). If s is already 3 characters long, the whole subexpression will equal s so the assignment to that entry of aux is useless but harmless, so I find it simpler to not even bother checking (prepending an if len(s)<3: would be fine too, matter of taste;-).

子表达式('00'+ s)[ - 3:]通过取最后3个字符(即[-3:]切片部分)计算“左边填充'0'到3的长度”通过在s的左边放置零获得的字符串(即'00'+ s部分)。如果s已经是3个字符长,那么整个子表达式将等于s,因此对aux的那个条目的赋值是无用的但是无害的,所以我发现更简单甚至不打扰检查(在if之前加上一个len(s)<3:将是味道也很好;-)。

There are other approaches (e.g. formatting x again if needed) but this is hardly the crux of the code (it executes just 8 times to build up the auxiliary "lookup table", after all;-), so I didn't pay it enough attention.

还有其他方法(例如,如果需要再次格式化x),但这不是代码的关键(它只执行8次构建辅助“查找表”,毕竟;-),所以我没有支付它足够的关注。

...nor did I unit-test it, so it has a bug in one obscure corner case. Can you see it...?

...我也没有对它进行单元测试,所以它在一个不起眼的角落案例中有一个错误。你能看见它吗...?

Suppose row has '01' as the last entry: THAT key, after my code's above has built aux, will not be present in aux (both 1 and 001 WILL be, but that's scanty consolation;-). In the code above I use the original s, '1', and the length-three padded version, '001', but the intermediate length-two padded version, oops, got overlooked;-).

假设行有'01'作为最后一个条目:在我的代码上面已经构建了aux之后,那个密钥将不会出现在aux中(1和001都会出现,但这是很少的安慰;-)。在上面的代码中,我使用原始的s,'1'和长度为3的填充版本'001',但是中间长度 - 两个填充版本,oops,被忽略了;-)。

So, here's a RIGHT way to do it...:

所以,这是一个正确的方法......:

aux = {}
for x in range(8):
  s = format(x, 'b')
  aux[s] = x
  while len(s) < 3:
    s = '0' + s
    aux[s] = x

...no doubt simpler and more obvious, but, even more importantly, CORRECT;-).

......无疑更简单,更明显,但更重要的是,正确;-)。

#3


1  

If you're dealing with processing raw data of any kind, I'd like to recommend the excellent bitstring module:

如果您正在处理任何类型的原始数据,我想推荐优秀的bitstring模块:

>>> import bitstring
>>> bits = bitstring.Bits('0b110101001')
>>> [b.uint for b in bits.cut(3)]
[6, 5, 1]

Description from the home page:

主页的描述:

A Python module that makes the creation, manipulation and analysis of binary data as simple and natural as possible.

一个Python模块,它使二进制数据的创建,操作和分析尽可能简单和自然。

Bitstrings can be constructed from integers, floats, hex, octal, binary, bytes or files. They can also be created and interpreted using flexible format strings.

可以用整数,浮点数,十六进制,八进制,二进制,字节或文件构造Bitstrings。它们也可以使用灵活的格式字符串创建和解释。

Bitstrings can be sliced, joined, reversed, inserted into, overwritten, etc. with simple methods or using slice notation. They can also be read from, searched and replaced, and navigated in, similar to a file or stream.

可以使用简单的方法或使用切片表示法对Bitstrings进行切片,连接,反转,插入,覆盖等。它们也可以被读取,搜索和替换,以及导航,类似于文件或流。

Internally the bit data is efficiently stored in byte arrays, the module has been optimized for speed, and excellent code coverage is given by over 400 unit tests.

在内部,位数据有效地存储在字节数组中,模块已针对速度进行了优化,并且通过400多个单元测试提供了出色的代码覆盖率。

#4


0  

Great answers from Greg and Alex! List comprehensions and slicing are so pythonic! For short input strings I wouldn't bother with the dictionary lookup trick, but if the input string were longer, I would, as well as using gen-exps rather than list-comps, i.e.:

Greg和Alex的精彩回答!列表理解和切片是如此pythonic!对于短输入字符串,我不打扰字典查找技巧,但如果输入字符串更长,我会,以及使用gen-exps而不是list-comps,即:

row = list(thestring[i:i+3] for i in xrange(0, len(thestring), 3))

row = list(thestring [i:i + 3] for x in xrange(0,len(thestring),3))

and

numbers = list(aux[x] for x in row)

numbers = list(行中x的aux [x])

since gen-exp perform better.

因为gen-exp表现更好。

#5


0  

Wouldn't this be easier:

这不会更容易:

(I wanted an array of the upper 3 bits of a variable that contained the integer 29)

(我想要一个包含整数29的变量的高3位数组)

format your variables and arrays first

首先格式化变量和数组

a = ''

a =''

b = []

b = []

I stole this from a really good example in this forum, it formats the integer 29 into 5 bits, bits zero through four and puts the string of bits into the string variable "a". [edited] Needed to change the format from 0:5b to 0:05b, in order to pad zeros when the integer is < 7.

我从这个论坛中的一个非常好的例子中偷了这个,它将整数29格式化为5位,位0到4,并将位串放入字符串变量“a”。 [编辑]需要将格式从0:5b更改为0:05b,以便在整数<7时填充零。

a = '{0:05b}'.format(29)

a ='{0:05b}'。格式(29)

look at your string variable

看看你的字符串变量

a

'11101'

split your string into an array

将您的字符串拆分为数组

b[0:3] = a[0:3]

b [0:3] = a [0:3]

this is exactly what I wanted.

这正是我想要的。

b

['1', '1', '1']

['1','1','1']