Python UnicodeEncodeError，但是我将参数编码为UTF-8

Here is my code:

这是我的代码:

def renren_get_sig(params):
    cat_params = ''.join([u'%s=%s'%(unicode(k), unicode(params[k])) for k in sorted(params)])
    sig = hashlib.md5(u"%s%s"%(unicode(cat_params), unicode(SEC_KEY))).hexdigest()
    return sig

The exception message is:

异常消息是:

Exception Type: UnicodeEncodeError
Exception Value: 'ascii' codec can't encode characters in position 138-141: ordinal not in range(128)

异常类型:UnicodeEncodeError异常值:“ascii”编解码器不能在位置138-141:序号不在范围内编码字符(128)

The dic params value is as the following:

dic参数值如下:

params ={
'access_token':u'195036|6.3cf38700f.2592000.1347375600-462350295',
 'action_link': u'http://wohenchun.xxx.com',
 'action_name': u'\u6d4b\u8bd5\u4e00\u4e0b',
 'api_key': u'8c0a2cded4f84bbba4328ccba22c3374',
 'caption': u'\u7eaf\u6d01\u6307\u6570\u6d4b\u8bd5',
 'description': u'\u4e16\u754c\u8fd9\u4e48\u4e71\uff0c\u88c5\u7eaf\u7ed9\u8c01\u770b\uff1f\u5230\u5e95\u4f60\u6709\u591a\u5355\u7eaf\uff0c\u8d85\u7ea7\u5185\u6db5\u7684\u4f60\uff0c\u6562\u4e0d\u6562\u6311\u6218\u8d85\u7ea7\u5185\u6db5\u7684\u9898\u76ee?\u4e0d\u7ba1\u4f60\u6d4b\u4e0d\u6d4b\uff0c\u53cd\u6b63\u6211\u662f\u6d4b\u4e86\uff01',
 'format': u'JSON',
 'image': u'http://hdn.xnimg.cn/photos/hdn21/20120809/1440/h0dd1376.jpg',
 'message': u'\u5c3c\u?!! \u3010\u4f60\u96be\u9053\u6bd4\u6211\u66f4\u7eaf\u6d01\u4e48,\u6765\u6d4b\u8bd5\u4e00\u4e0b\u5427!\u4f20\u9001\u95e8 >>  http://wohenchun.jiongceyan.com \u3011\r\n\t\t\t\t\t\t\t\t\t\t',
 'method': u'feed.publishFeed',
 'name': u'\u4eba\u4eba\u53f2\u4e0a\u6700\u706b\u7206\u6d4b\u8bd5\u4e4b\u5355\u7eaf\u6d4b\u8bd5',
 'url': u'http://wohenchun.xxx.com',
 'v': u'1.0'}

All the key-value pairs in params are Unicode objects. Why do I still get such an exception?

params中的所有键值对都是Unicode对象。为什么我仍然会有这样的例外?

Thank you!

谢谢你们！

2 个解决方案

#1

Unicode is the problem. Hashing algorithms are designed to be used with bytes, not unicode code points. So you must choose encoding and encode your unicode strings to byte strings before applying hashing algorithm:

Unicode是这个问题。散列算法设计用于字节，而不是unicode代码点。因此，在应用散列算法之前，您必须选择编码并将unicode字符串编码为字节字符串:

from hashlib import md5 

str_to_hash = unicode_str.encode('utf-8')
md5(str_to_hash).hexdigest()

There was an issue about this problem in Python tracker - investigate it for more information.

在Python跟踪器中有一个关于这个问题的问题—调查它以获得更多的信息。

#2

@Rostyslav has it right. Use byte strings with hashlib. May I also suggest using a source file encoding for readability? Check the message parameter. The original code had an error with \u?!! in the string. I left it out:

@Rostyslav说得没错。使用带有hashlib的字节字符串。我也可以建议使用源文件编码来提高可读性吗?检查消息参数。原来的代码有一个错误的\u?!的字符串。我离开了出来:

# coding: utf8
import hashlib

SEC_KEY = 'salt'

params = {
    u'access_token' : u'195036|6.3cf38700f.2592000.1347375600-462350295',
    u'action_link' : u'http://wohenchun.xxx.com',
    u'action_name' : u'测试一下',
    u'api_key' : u'8c0a2cded4f84bbba4328ccba22c3374',
    u'caption' : u'纯洁指数测试',
    u'description' : u'世界这么乱，装纯给谁看？到底你有多单纯，超级内涵的你，敢不敢挑战超级内涵的题目?不管你测不测，反正我是测了！',
    u'format' : u'JSON',
    u'image' : u'http://hdn.xnimg.cn/photos/hdn21/20120809/1440/h0dd1376.jpg',
    u'message' : u'尼【你难道比我更纯洁么,来测试一下吧!传送门 >>  http://wohenchun.jiongceyan.com 】\r\n\t\t\t\t\t\t\t\t\t\t',
    u'method' : u'feed.publishFeed',
    u'name' : u'人人史上最火爆测试之单纯测试',
    u'url' : u'http://wohenchun.xxx.com',
    u'v' : u'1.0'}

def renren_get_sig(params):
    data = u''.join(u'{0}={1}'.format(k,v) for k,v in sorted(params.items()))
    return hashlib.md5(data.encode('utf8') + SEC_KEY).hexdigest()

print renren_get_sig(params)

Output:

输出:

085b14d1384ba805d2d5d5e979913b27

#1