There appears to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface
似乎有两种不同的方式将字符串转换为字节,正如在TypeError的答案中所看到的:“str”不支持缓冲区接口。
Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference?
这些方法中哪一个更好或者更符合python语言?还是只是个人偏好的问题?
b = bytes(mystring, 'utf-8')
b = mystring.encode('utf-8')
5 个解决方案
#1
326
If you look at the docs for bytes
, it points you to bytearray
:
如果你看一下文档的字节,它会告诉你bytearray:
bytearray([source[, encoding[, errors]]])
中bytearray([[来源,编码[、错误]]])
Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.
返回一个新的字节数组。bytearray类型是范围0 <= x < 256的整数的可变序列。它拥有大多数常用的可变序列方法,在可变序列类型中描述,以及字节类型所拥有的大多数方法,见字节和字节数组方法。
The optional source parameter can be used to initialize the array in a few different ways:
可选的源参数可用于以几种不同的方式初始化数组:
If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().
如果它是一个字符串,您还必须给出编码(以及可选的错误)参数;bytearray()然后使用str.encode()将字符串转换为字节。
If it is an integer, the array will have that size and will be initialized with null bytes.
如果它是一个整数,那么数组将具有这个大小,并且将以null字节初始化。
If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.
如果它是符合缓冲区接口的对象,则将使用对象的只读缓冲区来初始化字节数组。
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.
如果它是可迭代的,那么它必须是在0 <= x < 256的范围内的整数的iterable,它被用作数组的初始内容。
Without an argument, an array of size 0 is created.
如果没有参数,将创建一个大小为0的数组。
So bytes
can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.
所以字节可以做的不仅仅是编码一个字符串。它允许您使用任何有意义的源参数调用构造函数。
For encoding a string, I think that some_string.encode(encoding)
is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding)
-- there is no explicit verb when you use the constructor.
对于编码一个字符串,我认为some_string.encode(编码)比使用构造函数更符合python,因为它是最自我的文档化——“拿这个字符串,用这个编码来编码”比字节(some_string,编码)更清晰——当你使用构造函数时,没有明确的动词。
Edit: I checked the Python source. If you pass a unicode string to bytes
using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode
; so you're just skipping a level of indirection if you call encode
yourself.
编辑:我检查了Python源代码。如果使用CPython将unicode字符串传递给字节,它将调用PyUnicode_AsEncodedString,这是编码的实现;如果你调用了编码,你就跳过了一个间接的层级。
Also, see Serdalis' comment -- unicode_string.encode(encoding)
is also more Pythonic because its inverse is byte_string.decode(encoding)
and symmetry is nice.
另外,请参阅Serdalis的注释——unicode_string.encode(编码)也是python的,因为它的逆是byte_string.decode(编码)和对称是好的。
#2
139
Its easier than it is thought:
它比想象的简单:
my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation
#3
31
The absolutely best way is neither of the 2, but the 3rd. The first parameter to encode
defaults to 'utf-8'
. Thus the best way is
最好的方法不是2,而是3。第一个参数编码默认为“utf-8”。所以最好的方法是。
b = mystring.encode()
This will also be faster, because the default argument results not in the string "utf-8"
in the C code, but NULL
, which is much faster to check!
这也会更快,因为默认参数的结果不是C代码中的字符串“utf-8”,而是NULL,这比检查要快得多!
Here be some timings:
这里是一些计时:
In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest.
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop
In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest.
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop
Despite the warning the times were very stable after repeated runs - the deviation was just ~2 per cent.
尽管有警告称,在重复运行后,纽约时报非常稳定——偏差仅为2%。
#4
21
You can simply convert string to bytes using:
您可以简单地将字符串转换为字节:
a_string.encode()
a_string.encode()
and you can simply convert bytes to string using:
你可以简单地将字节转换成字符串:
some_bytes.decode()
some_bytes.decode()
bytes.decode
and str.encode
have encoding='utf-8'
as default value.
decode和strl .encode将编码='utf-8'作为默认值。
The following functions (taken from Effective Python) might be useful to convert str
to bytes
and bytes
to str
:
下面的函数(取自有效的Python)可能有助于将str转换为字节和字节。
def to_bytes(bytes_or_str):
if isinstance(bytes_or_str, str):
value = bytes_or_str.encode() # uses 'utf-8' for encoding
else:
value = bytes_or_str
return value # Instance of bytes
def to_str(bytes_or_str):
if isinstance(bytes_or_str, bytes):
value = bytes_or_str.decode() # uses 'utf-8' for encoding
else:
value = bytes_or_str
return value # Instance of str
#5
8
so_string = '*'
so_bytes = so_string.encode( )
#1
326
If you look at the docs for bytes
, it points you to bytearray
:
如果你看一下文档的字节,它会告诉你bytearray:
bytearray([source[, encoding[, errors]]])
中bytearray([[来源,编码[、错误]]])
Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.
返回一个新的字节数组。bytearray类型是范围0 <= x < 256的整数的可变序列。它拥有大多数常用的可变序列方法,在可变序列类型中描述,以及字节类型所拥有的大多数方法,见字节和字节数组方法。
The optional source parameter can be used to initialize the array in a few different ways:
可选的源参数可用于以几种不同的方式初始化数组:
If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().
如果它是一个字符串,您还必须给出编码(以及可选的错误)参数;bytearray()然后使用str.encode()将字符串转换为字节。
If it is an integer, the array will have that size and will be initialized with null bytes.
如果它是一个整数,那么数组将具有这个大小,并且将以null字节初始化。
If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.
如果它是符合缓冲区接口的对象,则将使用对象的只读缓冲区来初始化字节数组。
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.
如果它是可迭代的,那么它必须是在0 <= x < 256的范围内的整数的iterable,它被用作数组的初始内容。
Without an argument, an array of size 0 is created.
如果没有参数,将创建一个大小为0的数组。
So bytes
can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.
所以字节可以做的不仅仅是编码一个字符串。它允许您使用任何有意义的源参数调用构造函数。
For encoding a string, I think that some_string.encode(encoding)
is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding)
-- there is no explicit verb when you use the constructor.
对于编码一个字符串,我认为some_string.encode(编码)比使用构造函数更符合python,因为它是最自我的文档化——“拿这个字符串,用这个编码来编码”比字节(some_string,编码)更清晰——当你使用构造函数时,没有明确的动词。
Edit: I checked the Python source. If you pass a unicode string to bytes
using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode
; so you're just skipping a level of indirection if you call encode
yourself.
编辑:我检查了Python源代码。如果使用CPython将unicode字符串传递给字节,它将调用PyUnicode_AsEncodedString,这是编码的实现;如果你调用了编码,你就跳过了一个间接的层级。
Also, see Serdalis' comment -- unicode_string.encode(encoding)
is also more Pythonic because its inverse is byte_string.decode(encoding)
and symmetry is nice.
另外,请参阅Serdalis的注释——unicode_string.encode(编码)也是python的,因为它的逆是byte_string.decode(编码)和对称是好的。
#2
139
Its easier than it is thought:
它比想象的简单:
my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation
#3
31
The absolutely best way is neither of the 2, but the 3rd. The first parameter to encode
defaults to 'utf-8'
. Thus the best way is
最好的方法不是2,而是3。第一个参数编码默认为“utf-8”。所以最好的方法是。
b = mystring.encode()
This will also be faster, because the default argument results not in the string "utf-8"
in the C code, but NULL
, which is much faster to check!
这也会更快,因为默认参数的结果不是C代码中的字符串“utf-8”,而是NULL,这比检查要快得多!
Here be some timings:
这里是一些计时:
In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest.
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop
In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest.
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop
Despite the warning the times were very stable after repeated runs - the deviation was just ~2 per cent.
尽管有警告称,在重复运行后,纽约时报非常稳定——偏差仅为2%。
#4
21
You can simply convert string to bytes using:
您可以简单地将字符串转换为字节:
a_string.encode()
a_string.encode()
and you can simply convert bytes to string using:
你可以简单地将字节转换成字符串:
some_bytes.decode()
some_bytes.decode()
bytes.decode
and str.encode
have encoding='utf-8'
as default value.
decode和strl .encode将编码='utf-8'作为默认值。
The following functions (taken from Effective Python) might be useful to convert str
to bytes
and bytes
to str
:
下面的函数(取自有效的Python)可能有助于将str转换为字节和字节。
def to_bytes(bytes_or_str):
if isinstance(bytes_or_str, str):
value = bytes_or_str.encode() # uses 'utf-8' for encoding
else:
value = bytes_or_str
return value # Instance of bytes
def to_str(bytes_or_str):
if isinstance(bytes_or_str, bytes):
value = bytes_or_str.decode() # uses 'utf-8' for encoding
else:
value = bytes_or_str
return value # Instance of str
#5
8
so_string = '*'
so_bytes = so_string.encode( )