This question already has an answer here:
这个问题已经有了答案:
- Best way to convert string to bytes in Python 3? 5 answers
- 在Python 3中将字符串转换为字节的最佳方法是什么?5个回答
I am new to python3, coming from python2, and I am a bit confused with unicode fundamentals. I've read some good posts, that made it all much clearer, however I see there are 2 methods on python 3, that handle encoding and decoding, and I'm not sure which one to use.
我是python3的新手,来自python2,我有点困惑于unicode的基本原理。我读过一些不错的文章,这些文章让我更清楚了,但是我看到python 3上有两个方法,它们处理编码和解码,我不确定使用哪个。
So the idea in python 3 is, that every string is unicode, and can be encoded and stored in bytes, or decoded back into unicode string again.
所以python 3的思想是,每个字符串都是unicode,并且可以编码和存储在字节中,或者重新解码成unicode字符串。
But there are 2 ways to do it:u'something'.encode('utf-8')
will generate b'bytes'
, but so does bytes(u'something', 'utf-8')
.
And b'bytes'.decode('utf-8')
seems to do same thing as str(b'', 'utf-8')
.
但是有两种方法:u'something .encode('utf-8')将生成b'bytes,字节(u'something', 'utf-8')也会生成b'bytes。而b'bytes'.decode('utf-8')似乎和str(b', 'utf-8'一样。
Now my question is, why are there 2 methods that seem to do the same thing, and is either better than the other (and why?) I've been trying to find answer to this on google, but no luck.
现在我的问题是,为什么有两种方法看起来做着同样的事情,或者比另一种更好(为什么?)我一直想在谷歌上找到答案,但是运气不好。
>>> original = '27岁少妇生孩子后变老'
>>> type(original)
<class 'str'>
>>> encoded = original.encode('utf-8')
>>> print(encoded)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded)
<class 'bytes'>
>>> encoded2 = bytes(original, 'utf-8')
>>> print(encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded2)
<class 'bytes'>
>>> print(encoded+encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x8127\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> decoded = encoded.decode('utf-8')
>>> print(decoded)
27岁少妇生孩子后变老
>>> decoded2 = str(encoded2, 'utf-8')
>>> print(decoded2)
27岁少妇生孩子后变老
>>> type(decoded)
<class 'str'>
>>> type(decoded2)
<class 'str'>
>>> print(str(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81', 'utf-8'))
27岁少妇生孩子后变老
>>> print(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'.decode('utf-8'))
27岁少妇生孩子后变老
3 个解决方案
#1
38
Neither is better than the other, they do exactly the same thing. However, using .encode()
and .decode()
is the more common way to do it. It is also compatible with Python 2.
两者都不比另一个好,他们做的是完全一样的事情。然而,使用.encode()和.decode()是更常见的方法。它也与Python 2兼容。
#2
10
To add to Lennart Regebro's answer There is even the third way that can be used:
为了给Lennart Regebro的答案添彩,甚至还有第三种方法可以使用:
encoded3 = str.encode(original, 'utf-8')
print(encoded3)
Anyway, it is actually exactly the same as the first approach. It may also look that the second way is a syntactic sugar for the third approach.
不管怎样,它实际上和第一个方法是一样的。看起来第二种方法是第三种方法的语法糖。
A programming language is a means to express abstract ideas formally, to be executed by the machine. A programming language is considered good if it contains constructs that one needs. Python is a hybrid language -- i.e. more natural and more versatile than pure OO or pure procedural languages. Sometimes functions are more appropriate than the object methods, sometimes the reverse is true. It depends on mental picture of the solved problem.
编程语言是一种正式表达抽象思想、由机器执行的手段。如果一种编程语言包含人们需要的结构,那么它就被认为是好的。Python是一种混合语言——即比纯OO或纯过程语言更自然、更通用。有时函数比对象方法更合适,有时则相反。这取决于对已解决问题的想象。
Anyway, the feature mentioned in the question is probably a by-product of the language implementation/design. In my opinion, this is a nice example that show the alternative thinking about technically the same thing.
无论如何,问题中提到的特性可能是语言实现/设计的副产品。在我看来,这是一个很好的例子,它展示了对技术上相同事物的替代思考。
In other words, calling an object method means thinking in terms "let the object gives me the wanted result". Calling a function as the alternative means "let the outer code processes the passed argument and extracts the wanted value".
换句话说,调用对象方法意味着考虑“让对象给我想要的结果”。调用函数作为替代的方法是“让外部代码处理传递的参数并提取所需的值”。
The first approach emphasizes the ability of the object to do the task on its own, the second approach emphasizes the ability of an separate algoritm to extract the data. Sometimes, the separate code may be that much special that it is not wise to add it as a general method to the class of the object.
第一种方法强调对象独立完成任务的能力,第二种方法强调单独的算法提取数据的能力。有时,单独的代码可能非常特殊,因此将其作为一个通用方法添加到对象的类中是不明智的。
#3
6
To add to add to the previous answer, there is even a fourth way that can be used
为了补充上一个答案,甚至还有第四种方法可以使用
import codecs
encoded4 = codecs.encode(original, 'utf-8')
print(encoded4)
#1
38
Neither is better than the other, they do exactly the same thing. However, using .encode()
and .decode()
is the more common way to do it. It is also compatible with Python 2.
两者都不比另一个好,他们做的是完全一样的事情。然而,使用.encode()和.decode()是更常见的方法。它也与Python 2兼容。
#2
10
To add to Lennart Regebro's answer There is even the third way that can be used:
为了给Lennart Regebro的答案添彩,甚至还有第三种方法可以使用:
encoded3 = str.encode(original, 'utf-8')
print(encoded3)
Anyway, it is actually exactly the same as the first approach. It may also look that the second way is a syntactic sugar for the third approach.
不管怎样,它实际上和第一个方法是一样的。看起来第二种方法是第三种方法的语法糖。
A programming language is a means to express abstract ideas formally, to be executed by the machine. A programming language is considered good if it contains constructs that one needs. Python is a hybrid language -- i.e. more natural and more versatile than pure OO or pure procedural languages. Sometimes functions are more appropriate than the object methods, sometimes the reverse is true. It depends on mental picture of the solved problem.
编程语言是一种正式表达抽象思想、由机器执行的手段。如果一种编程语言包含人们需要的结构,那么它就被认为是好的。Python是一种混合语言——即比纯OO或纯过程语言更自然、更通用。有时函数比对象方法更合适,有时则相反。这取决于对已解决问题的想象。
Anyway, the feature mentioned in the question is probably a by-product of the language implementation/design. In my opinion, this is a nice example that show the alternative thinking about technically the same thing.
无论如何,问题中提到的特性可能是语言实现/设计的副产品。在我看来,这是一个很好的例子,它展示了对技术上相同事物的替代思考。
In other words, calling an object method means thinking in terms "let the object gives me the wanted result". Calling a function as the alternative means "let the outer code processes the passed argument and extracts the wanted value".
换句话说,调用对象方法意味着考虑“让对象给我想要的结果”。调用函数作为替代的方法是“让外部代码处理传递的参数并提取所需的值”。
The first approach emphasizes the ability of the object to do the task on its own, the second approach emphasizes the ability of an separate algoritm to extract the data. Sometimes, the separate code may be that much special that it is not wise to add it as a general method to the class of the object.
第一种方法强调对象独立完成任务的能力,第二种方法强调单独的算法提取数据的能力。有时,单独的代码可能非常特殊,因此将其作为一个通用方法添加到对象的类中是不明智的。
#3
6
To add to add to the previous answer, there is even a fourth way that can be used
为了补充上一个答案,甚至还有第四种方法可以使用
import codecs
encoded4 = codecs.encode(original, 'utf-8')
print(encoded4)