平*立文本与Python哈希。

时间:2021-04-26 23:09:16

Is there any function in Python's stdlib or extra package for generating a consistent hash from text independent of the platform and architexture?

在Python的stdlib或额外的包中,是否有任何函数可以从独立于平台和架构的文本中生成一致的散列?

I had been using this code:

我一直在用这段代码:

import hashlib

def get_text_hash(text):
    h = hashlib.sha512()
    if not isinstance(text, unicode):
        text = unicode(text, encoding='utf-8', errors='replace')
    h.update(text.encode('utf-8', 'replace'))
    return h.hexdigest()

But I found that it returns completely different hashes for the same text on Fedora 16 vs Ubuntu 12.

但我发现它返回的是与Fedora 16和Ubuntu 12相同文本的完全不同的散列。

EDIT: Here is code to reproduce the problem:

编辑:这里是复制问题的代码:

#!/usr/bin/python

import hashlib

def get_file_hash(fin):
    """
    Iteratively builds a file hash without loading the entire file into memory.
    """
    if isinstance(fin, basestring):
        fin = open(fin)
    h = hashlib.sha512()
    for text in fin.readlines():
        if not isinstance(text, unicode):
            text = unicode(text, encoding='utf-8', errors='replace')
        h.update(text.encode('utf-8', 'replace'))
    return h.hexdigest()

def get_text_hash(text):
    """
    Returns the hash of the given text.
    """
    h = hashlib.sha512()
    if not isinstance(text, unicode):
        text = unicode(text, encoding='utf-8', errors='replace')
    h.update(text.encode('utf-8', 'replace'))
    return h.hexdigest()

image_content = '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x002\x00\x00\x009\x08\x06\x00\x00\x00t\xf8xr\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00\x06bKGD\x00\xff\x00\xff\x00\xff\xa0\xbd\xa7\x93\x00\x00\x00\tpHYs\x00\x00\x1e\xc2\x00\x00\x1e\xc2\x01n\xd0u>\x00\x00\x00\x07tIME\x07\xdc\x08\x15\r0\x02\xc7\x0f^\x14\x00\x00\x0eNIDATh\xde\xd5\x9aytTU\x9e\xc7?\xf7\xbdWkRd\x0f\x01\x12\x02!\xc8\x120,\r6\xb2#\x10\xa2(\xeeK\xdb:n\xd3\xca\xd0\xb46\x8b\xf6\xc09=\xd3v\xdb\xa7\x91\xb8\xb5L\xab\xe3\xb8\xb5\x8c\xbd\x88\x83\xda-\x84\xc4\xb84\x13\x10F\x01\r\xb2%\x01C \tY\x8a$UIj{\xef\xce\x1f\xaf\xaa\x92\x90\x80@\xc0\xa6\xef9uRU\xa9\xba\xf7\xf7\xbd\xdf\xdf\xfe+\xb8Xk\xc1\x8aD\xe6/K\x03 \x7f%\x17{)\x17kcUU/s%\xa6\xbce[\xf8h6\x9b\xd7B\xfe\x8a\x7fL \xe9)I\t\xd3\'O\x9c-\xe2\x07\xfc\x89\x9c\xd9N6\x17\\Tf.\x1a\x90\xe1\x19\xa9\xa3\xe6\x8d\x19\x84\x96>f\x02#\xe6l\x02`\xf3ZX\xb0\xe2\x1f\x08\xc8\x84\xfb\xfb\xa5\xc6\xc7.\x9c;&\x1d\xabE\x95d\x8c\x9f\xc1\r\xbfx\x95\xf4\\\x85\xc2\x82\x8b\x02F\xb9\x80\xc6M\xcc\xf5\xab\x01p\x0eI\x9f1kl\xd6\xeca)\xfd\xb0)\x8a@\xb3\n\x06\xe4\xdc\xcb\x15\xb7\xad\xe1\xca\x1f:L0\xcb\xb9\xf4V\xd7\x1b\xce\xbec\xfe\xea7\x8bO\x18R\xca\x0f\xcb\xdd\x86kU\x89d\xd9\x16\x83\xe5E\x92%\xefHnyr#\xdf\xbb%\xbd\xf3\xbb\x17\x06\x90\xda\xe7\x1d\xf2W@a\x81\xf9|\xe6OV\xad\x7fz\xc9\xb3?\xbe\xe6\xfbI\xfb\x1b\xda\xe5\xafK\xbe\x11_\xd7y\x91\x08\x01H4\x9b\xc0\x95<\x12W\xf2\x1d$\x0fm\xe6\xc8\xce\xddTl7/\xa2b[\x9f\xc4\x10\xe7\xcd@D\xf8\xa1\xb7*\xd6\x9c\xac\x01\xb9C\x07\xbc\xf8\xdc\xe2E\x0bs\xb2\xd2)>\xe4f\xcb\xc1&^\xdfy\x8c\xa0n`\xe2\x08/)%B\x08:Z\xe1D\xf9\x0e\x02\xde\xd5H\xb6q\xb2\xdaO\xe9\xef\x8d\xef\x96\x91\xc8\xed\r\xbfcX\xde\xeds\xef\xbf\xe1\xca\xb1\x7f~n\xf1\xf5cGe\xa4\xcaV\x7f\x88=5mB\x02\xfb\xeb\xdb\xf0\xfaC\xdd\x81\x880;\x16\xbb ~`:V\xe7\xdd(\xea-\xc4$\xda\xc9\xc8m\xa1b[C\xf4\xb2\xce\x81%q.*\xe4Tm\xb4\xff\xf5\t`J\xf2e\xf7\xdf\xf0\xf0\xe2k\xa7\xdc>m\xf4\x90\xec\xef\rO\x07\x90\x80\x08\xea\x06\xaf}QGY\x8d\x87\xff\xfa\xec\x18\xbe\xa0\xde\x1dH\'3 \x84\x0c\x8b \xf0{\xc1\xd7Z\x83\xbbz\x03u\x07\xd6\xb2m\xfd1~\xf0[p\x1f\xedd\xbf\xcf@\xf2W\x9a1\x00`\xd2\x8f\xee[ro\xfe\x13\x8f\xdd<+e@\xa2K\xd5T5\n\x02\xe0\xbd\xaf\x1b\xa9h\xea\xe0\xc5\xd2\xa3T4\xb4\x99\xef\x8a3\x1c\x13\x05\x84@J0t\x89\xbb\xba\x9e\xe3e\x8fQ\xb2\xee\r\xf3\xfc\xe5\xb0\xf9\xa9>\x00\xe9\n`\xce\xc3\xc9C3\x07\xbc\xf9\xc6\xb2[\x17L\x1f3T\x86\xd5]tj\x0b|\\\xd9\xcc\xd7\'\xdax\xb7\xac\x8e\x92\xfd\x8d\xa0\x9d\x87w7\x81A[3\xd4\xee[G[\xd3c|\xf8|;\xf9\x8f\xc2\xe6\'\xcf\xc3F\xba\x80p,Z5\xfb\x9f\x17^\xf9\xde\x7f?z\xc7\xe4Q\x83S\x91R\n\x11\x96^\x08AP\x97|v\xb4\x95/k\xbd|x\xa8\x91\xc2\xb2z\xb0\x9c\xa7C\x8c\xb0gs\x803~20\x82\xd4\xecR\x8a\x9f\xf5\x90\xff(T\x94\x9e\x03\x90\xaeL,X\xb1d\xdd\x92\x1b^\\~\xe3\x8cA\t\xb1\x0e\xd9\x95E!\x04\xed\x01\x9d\x8f\x0f7SV\xd7\xc6\xa6\x03\rl\xda\xd7\x00\xea\x05\x88\xb3RJ\xacvp\xb8r\xd0\xacS\x89M\xfa\x94\x8f_p\x9f\xce\t\xa8\xbd\xc6\x85\xcda\xe3\x9a\xb1\xf4\xf1\xa2\xb5K~y\xf3\xb4\xb1\xb16\x8b&\xc3L\x10yT7\xfb\xd8t\xc0MM\xab\x9f\xff\xf9\xea\x04\x1f\x957u\xbf\xd5>\x05\x06a\xda\x8cj\x01g|\x06\xd6\x98\xd98b?\xe0\xd3\x97[X\xb0\x1c*\xb6\x7f\x0b\x90\x08\xda\xbc\xe5K\x8b\n\x96<>o|\xb6\x85\x88\xc9\x86=g\xc8\x80]\xc7\xbd|\\\xd9\x8c\xc7\xa7\xf3\xf2\xf6j>\xafj\x06E\\\x18\x10\xdd]\xb5\x00!q\xc6\xf5\xc7\xea\x1cMB\xfa\x07|\xf4\x1f\xbeS\xc1\xa8=\x02]\xc56\x1c\x8bV\xcd}\xe6\xa1\xeb_\xbcm\xfa\xe51\x11\x00B\x08tCR\xe7\t\xf0Ie3\xbbk\xbcT6\xb5\xf3\xfc\xd6*\xaa\xdd\x1d\x17F\x9d\xce\x04HQA\xb3g#uH\xc9,\xc1\xea\xeaf/j7\xbb(,@,X\x9e\xb4h\xfa\xc4\xf7V\xdf6{\x80\xd3\xa6\xc9\x88Q\xd7\xb4\xf8\xf9\xbfc\x1evV{\xa8hj\xa7p\x7f\x03\x1b\xbf:A{P\x07\xe5"\x82\xe8b5Xl\x02_\xcb4B\xa1=\x14\xae=\xc0\x82\x95Q\r\xd2\xba~\x0e\xc0\xe9\x8a{\xfd\xba9S\x86%\xc4\xd8$ \xaaN\xfa\xf8\xaa\xd6K\xbd7H\xc8\x90\x94\x947Rt\xb0\x89\x96\x8eP\xf4;\xdf\xd1\x12\x08Eb\x8b\x15\xd8b~\xcfU\xff\x92I\xe1\xda\x93\x91t\xa9\xbbB\xcfX|\xfd\xac\xebn\xda\xf8\x83\xe99RS\xa4\xe8\x08\x1a\x84\x0cS\xd8\xc3M\xed\xbc\xf5E\r\r\xde\xc0\x85\xb5\x83s\x89/\x8a\nMU\x12_\xb3\xa0\xa5\xfe\x1d6\xad\xb9\xd94\x89\x95(]\xd2h\x8d\xb4\xe1\x05\xce\x84\x14\xfc\xc1\x90hj\x0bR\xef\r\xf0u\xad\x97u[\xabx\xee\xa3\xc34x\xfeN \xa2\x9eP\x824L\xe3w\xc4/`\xfe#SM\x90\x06\x1a"\xac\xdfW?v\x1f\xfd\xb32>)o\x94\r\x1e\x9f\xf0\xeb\x065->\x1a[\xfcf\xf9\xa5*\x7f?\x10\x91e\x18 u\x90R`\xb1\xc4`s=H\xca\x88mly\xcaL1\x18\xb70\x86\xb1W\xbfM\xf2\xd0|\x84\xd2\xa9\xfa\x82\x0b\x17\x17.\xc4\n\x05\xa0\xe9\x1b\x08\xf9\xcd\xd7A\xdfQ<\x8d\x0b)z\xa6\xcc\xa4\xa3\xff\x8818\xe2\'\xa1\xa8\xa6\xd0\x8a\xe8\x8c\t\x97\n\x08\x80\x90\x0f\x8c`\xe7\xe5*\xea`l\xb1S\x99\xf6O(dM\x82\xf8\x81\x13\xd0\xac\xc9\\\xaaKJ\xf3\xe1o\x03#Z\x16HT\x0b(\xcau8\x13\x1d\nC&\xd9\xd1,Wa\x8b\xe5\xbb\xf6\xa7\xe7d\xe8z\x10\xda\x9b;\xf5]J\x11f\xe5*\xac\x0e\xa7B\xbf\xfe6T\xeb,T\x15$\xe2\x92d\x03\x01\xdeFS\xad"\xaa\x1eaE\xb3X\x91r\xa2\x82\xaa%a\xb1\'\xf5\xa1\x82\xbf\xf8l\xf8\xbd\xe0m \xeaa\xbb\x06I\xd5\x02\xd2\x98\xa1\xa0\x07Ga\x8b\x01y\x89\xa9\x95\x0c\x8b\xa3\x07\xc0]\xd5\x1b\x88p\x11\xa6\x82\xa2\xe6*\x18F&\x16; \xc5%i\x17\xee\xea\xae\x06\xde{\x11\xa6hY\nR&\xa3Z/AOe@s\r\x04\xda\xbf5\x97DQ\x924\x90j\x9fc\x85\x94=\xf6\xeefo\xe7\xb2\x7f\xa4f?Y\r\xbe\x96p9"z\xdf7\x9a\xc3k\x0e\xed\xbc-\xdc\x90(\xaa`p\x82\x9d\x1bF\xa72wD"\xd9\xc9N\x12\x9d\x164E\xe1PC\x1b\x7f\xfd\xba\x9e\re\r|s\xb2\x83\x0e\x9fn\xa6:\xa7o\ru2\xe1>\n\x1d\x9e\xce\xf2 |Qi\t1\xf4\xb3\xdbh\x0f\x049\xd1\xd2N0\xa8\x9b\x81\x1bEj\x08\xbc\xe8!P\xb5\xb3\x07\xa1\x1b$\xbbl,\x9d\x96\xc1\xe2\xa9\x19\xa4\xc4\xf4T\xcd\xc9\x83\xe3\x98<8\x8eU\xf3\x86\xf1\xa7\xdd\xb5\xbc\xbc\xa3\x86\xd2\n\xb7Y\x01\x9d\n&\xf2:\xe8\x83\xe6\xe3\x10h\xeb\x04\xe1\x0f\x92\x96\x1a\xc7\x8f\xe6\x8c\xe5\xa6+\x863b`"\xc7\xdc^6|v\x88\x7f\xdb\xb0\x1d\x7f0\x04\xe0S\x199{(\xae\x94\x1b\xd1\xac\xf2\xac\x1c\xb0\x94\x0cJp\xb0\xf1\x9e\\~8q 1V5\xda\x05\xed\xad\x7f`Q\x15\xc6\r\xea\xc7\xa2\x9cT\xfa\xc7\xd9()w#u#|\x93\x116\x14\xf0yLu\n\xf9\xcd\xd7R\x82?\xc8M\xd3G\xf3\xdaCy\xdc:e\x04\xfd\xe3b\xd0\x14\x85\xc4\x18;S/\x1bHK{\x80\xed\xfb\x8e\x81\xe0\x1b\x05E\xad4\rJ\x9c\x15\x08!\x04\x1f\xdc?\x9e)C\xe2\xbb\xb5rO_r\x9b+)\xc6\xc2\xb2\x99\x99\x1c^5\x9d\t\x83\xe3 \x14n\xf3*\nxN\x98\xc9\xa0\x1e\x0exRb\xb7j\xac{(\x8f?,\xbd\x9a\x9c\xf4\xa4\xe8Y]\xcf\xbcf\xc2\x10\x12R\xe2!\xd0Q\xa9\xa0h&\x90\xf0\x06g4\xc2\x90\xc1\xbf\xe7e\x91;\xd0\x15\xddT\x9c\x83!K)\x19\x9c`\xe7\x8b\x9f~\x9f\x9f\xe7g\x13g\x05\xea\x8f@Km\xd4\xee\xac\x9a\xc2\x15\xc3\x07\xf0\xf9\x13w\xb2d\xfe8,\x9a\xda\xe3\xac\xc8\xdf\xf4\x04\x97L\x8dsA\xc8\xbfK\xa3\xbd\xb9\x03G\xdc\x1e\xa41\xee\xcc\xadMHK\xb0\xb3\xfa\xaa\xacs\x06\xd0\xbd\xc3c\xde\xe6/\xf2\x861g\xb0\x9d\x17\x8a\xfcl\xaf\x80@\xc8`tz\x12\xb7^q\x19\x0f\xce\xbd\xbc\xdb\xcd\x9f\xee,\x87U\xc5*$\xa8\x96\x8f5\xbc\x8d>\xe2\x07}H\xd0?\x0e\x8b\xed\xf4v\x122\xb8}|\x1a\xaa"\xce\xa8N\xe7\x02f\xe6\xa8AL\xcaJ\xe5h\xa3\x07\xdd0\x18\x98\x10KB\xac\xfd[U6\xb2\xdauEH#\xd4\x82\xa1\xeeU\xd8\xfaj\x08\xbfg+\x81\xb63g[\x86d\xea\xd0\x84\xf3f\xa370\x00N\x9b\x85\x91\x83\x12\xc9\xc9H\x8e\x828\xcb3\xe4\t\x8f\x1f\x9f\xaf\xa3\x98\xc6\xea\x0e\xd3\xe76T\x96\x91\x90\xbe\x17)\xc7\x9c6\x80II\xb2\xd3\xd2\xe7.hDH\xdd\x90\x1c\xacu\xf3R\xc9Wl\xd8q\x08\xab\xaa\xf2\x93\x05\xe3\xb9gf\x0e\x89\xb1\xf6\xb3aD\x1c\xabk\xa0\xaa\xee\xe4\xfb\xecx\xc5g\x02\xd9\xfa\xda\x11R\x87\x97\xe0J\x19\x8d\xa2\x89^\x99\x11\x82\x86\xb6`\x9f@D\x04\xab\xa8;\xc9K%e\xfc\xae\xf8K\xda=\x1d\xa0\x99\xed\xb5\xe5/\x17\xb3i\xcf\x11\xd6\xdd;\x87\x91\x03\x13M\xe0B\xf4\x10FJI dp\xa4\xaei\x7fp\xfb\xe7\xa5\x00Jt\x90\xd9P\xf1\x9fx\x1a\xda\xa2>\xbcG\x1a \xf80\xdc\xdb\x95R\x9e\xb7:\xfdj\xe3\x0e\xf2\xd7\xbcK\xc1{;i\xef\x08\x98 "\x91\xddi\xa3d\xcf\x11\xae]\xfb.\xa5\x07\x8f\x9b\xc0Os\x96\xd7\x170\xde(\xdaQD\xf5\xdb\x87\xcdNc\xa4\xd7[\xb5\xbb\x81\x01\xa3\xd2I\xcc\x98\x8c\xa2\xc8h\x05\x16Y\x8a`\xd7\xb1VV\xce\x1e\x82US\xce\xca\x18\xbb~\xa6\xf4`\r\xf9\xbf\xd9\xc8\x86\x9d\x87h\xf2t\x80\x94\xb8\x9c6r\x87\xa4\x90\xd9?\x0e_ \x14\x05\xe6\xf6\xfa\xd8\xf2\xd5Q\xa6\x8d\x18Dz\x92\xab\xdb>\xe1\xe7\xf2\x95\xe2]-\xeb\xd7\xac_L\xfb\xc1F\xf2V\x9c\xc2\xda\xe5\xf3,\x0c\x9bYK\xc6\xe5Ih\xb6\x9e\x92\xea\x06\xf3F&\xf3\xee}\xe3p\xf62\xff8\x15\\ \xa4Ss\xb2\x8d_\xbe\xb3\x9dW\x8b\xf6\x80U\x03!\xc8\xee\x1f\xcf3w\xcfb\xe1\x84\xacn\xdf\x7ff\xd3.\xfe\xf5\x8f\xff\x8b?\xa4\x83n02#\x89w\x1e\xb9\x96\xd1\xe1\x80\x18I\x1b\xf7V70\xf6\xfe\'\xd7P\\\xf03\xf2V\xc2\x96\xb5]{\xbf+\xa0t\xbdA\xe6\xb8]\xf8=w!\x10\x18\xba\xc4\xd0\x05F\x08\xf4\x10\xc8\x10\x87k\xdc\x94\xec?N\xbaK%#\xc9\x85\xd6\xa5y\x1d\x01\xd1\xd0\xda\xce\xd6\x03\xc7y~\xcb\x1e\xee{\xa9\x88\xcf\x0f\x1c\x07\x9b\x05$\xdc~\xe5H6\xfd\xecFl\x16\x957\xff\xb6\x8f?\x94\x1e\xa0\xb8\xac\n\xb7\xd7\xc7=3G\xb3\xb7\xba\x89}\xc7\xdd\xa0\x08\x1a\x1b=\xfc\xad\xbc\x961\xe9Id\xa6\xf4\x03\xe0\xa3\x83\'\xc4\xddk\xff\xf8I\xd3\x9f\x7f~\x17\x00\xc3\xa7@\xc5\xf6S\x18\x89\xccF\x16\xaeZ\x8d\xc3\xf5+4\x9b\x99\xf7D\xab\xb3p7C\xd7M\xb5\x18\x9c\xcc\x98A\x89$\xba\x1ch\x9a\x8a\xdf\x1f\xa4\xb2\xbe\x85\xc3\r\xad\x94\xd7\x9d\xc4\xe3\xf1\x99\xe37! \xa43/w\x08E\xabo\xe6\xad\xd2\x03,_\xff)u\x8d\xad\x9d\x9ak\xd5x\xfb\xe1\x85\xe8Rr\xebs\x1ft\x89_:\xa9\x89\xb1\x8c\xcfL\x95zL\xb2\xd8{\xa0\xfcH\xdd\xde=3\xd8\xba\xeeX\xd71yw a\x9a\x98\xbb\xd4\x89#\xeeu\x9c\xf1\xb7D\x07\x95g0\xe2Nu\x92\x18\x86\xec=\xb35$)qN&g\xf5\xa7\xb0\xac\n]7:\xff\x1f6\xe8\xf4\xc4X\xfc\xbaNC\xab\xafg\x91\xd5/\r\xf4\x90\x87\xda\x03\xd7\xb2\xe97\x9f\x92\xb7\x02\xb6\x14\x9ca\x18\x1a\x19\xbbM\xbd\xd7E\xd2\xc0\xcd8\x13\xa6\x86=\x99\t\xe8l\x83\xa1\x94\x9d\x85\x90\xe8\x92\xe9\x1a\xf2\xf4\xb3\x94\x88\x87\x92\xd2L&#\x13_W\xaa@\xb5z8\xb2\xf3\xa7\x14\x16\xbc\xd2\xdbW{\xee\xb8y-\xe4\xad\x80\xd2\xd7<4\x1e\xcf\xa7\xad\xf9/\xe1\xacT\x84\xd99+\x10\x0e\xab\x85I\xc3\xd3\xb0j*\x04\xf5N\x17\xdb\x15DW\xc1u\x03\x02!\x92\\\x0e\xee\x9c>\n\x0cC\xa2j\x92\xb8\x01\x02\xc3h\xe1\xc8\x8eeQ\x10\xbd\xfc\xee\xab\xf7ah\xe56\xc8[\x0e\x9f\xbc\x10`p\xee\xfb\xe6\xcd\x8a\xe9\xa8\x96\xb3\xa3C\xc2\xfc\xdc!lxd!\xb3G\xa5S\xef\xf5QY{\x12:\xfc\xa6\xc0\xba\x0e!\xddL\xe5\x03A\xd0%\x99\x03\x13y`n.O\xdf5\x93E\x13\x87\xf1\xf4\xa62A|\x9a\xa0\xad\xe9\x10\xf5\xe5w\xb2\xb9`cd\x84@\xe1\xda^\x86\'gZ\x11c\x9a\xbfTA\xb1/\xc2\xee|\x1dG\\?\xa0g\x9c9\x85\x91\xac\xfe\xf1\xbc\xbfb\x119\xe9I\xf8\x02!Z;\x02\xec?\xee\xa6\xba\xa9\x15\xb7\xc7\x87aH\xfa\xc5X\xc9L\x89#;-\x9e\xfeqN\xec\x16\x8d\x17\x8bw\xcbU\x9b\xca\xc5I\xbf\x02\xf5\xe5\xef\xe3mZ\xcc\x96\xa7j\xba\xc9\xd3\xeb\x14\xe8\xdbV\xfe\n\x90\n\x14>\t\xb3\x7f\x9c\x8c3\xf65\x1cqW\xa1Y\x1d\xdd:~\xa7\x023\x0c\x84\x10\xac\xbcv\x12\x0f\xcc\x19K\x8a\xcb\x81USP\x15%j\xdf\xbaa\x10\xd4\r\xdcm~\xde\xff\xe20\xcf\x17\xef\xa5\xd2c\t\xe1on\xa2v\xdf\xe3\x14\xff\xf6w\xe1\xc1,l\xe9\xcb/\x1f\xba\xb1\xd3\x85\xd2\xf9\xcb\xe6a\x8fy\x00\x8bc\x06\x8a\x9a\x86\xaau\x02\xea\xda\x97\x95\x12\x02!\x90\x92!\xe9I\x0cK\x8d\'\xc6n&\x9e\x1d\x81\xa0lm\x0f\x88\x1ao\x90\xea\x9a\x16\xd3\xb8m\xf6]\xb4T\x7f\xc07\xbb\x9f\xe5\xcb\xbf\xb8\xcf\xa4J\xe7\x0f\x04\xe8\xe6\xf2.\x9b\xa90db.\xf6\x98\x19\x08\xe5\x1aTm\x0e\x8aEE\xb3t\x8f;]\xca\x80hW]\xb3\x85\xdf\x0b\x814\xaa\x11\xca\x16\x8c\xc0fj\x0el\xa7\xf4\xf5\xda\x1eg\x9d\xd5\x80\xf1|\xd6\xfceP\xf4\xb4\xf9|\xd6\x83\x02\xab\xd3\x81\xd5\x11\x83\x94\x93\x90\xc6,T\xcb8\x145\x1bEI\x02\xe1D\x08\x15)C\x08\xc5\x83P\xeb\x08u\xec#\x14\xdc\x89\xc5\xbe\x05\xa8\xa2\xb9\xa6\x83O^\n\xf4\xa5D\xf8\x7fM\xdd#\x1d\xf5i\xe0U\x00\x00\x00\x00IEND\xaeB`\x82'

fn = 'image.png'
open(fn, 'wb').write(image_content)

print get_text_hash(image_content)
print get_file_hash(fn)
print get_text_hash(open(fn, 'rb').read())
print get_text_hash(open(fn, 'r').read())

Here is my output on Ubuntu:

这是我在Ubuntu上的输出:

[user@ubuntu:~]# python --version
Python 2.7.3
[user@ubuntu:~]# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.1 LTS
Release:    12.04
Codename:   precise
[user@ubuntu:~]# python test_hash.py
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2

And here's my output on Fedora 13:

这是我在Fedora 13上的输出:

[user@fedora:~]# python --version
Python 2.6.4
[user@fedora:~]# cat /etc/fedora-release 
Fedora release 13 (Goddard)
[user@fedora:~]# python test_hash.py
3d29b0b5ebf500f22b459e1888508d194ffd20707413b69730228e1bd124698f5596a6ef9aadf593b216dba3603e8e2a6871e49a23aadc1b4985bbba92275975
1f0afa06c96447bb15a0389c738e14faa2156bc4a71f03d0300eab6e4386b46cfc7ea3514c6160a2be4213dc1fff80a7aac7e726f01807badd3b3d7edc4f8410
3d29b0b5ebf500f22b459e1888508d194ffd20707413b69730228e1bd124698f5596a6ef9aadf593b216dba3603e8e2a6871e49a23aadc1b4985bbba92275975
3d29b0b5ebf500f22b459e1888508d194ffd20707413b69730228e1bd124698f5596a6ef9aadf593b216dba3603e8e2a6871e49a23aadc1b4985bbba92275975

Here's my output on Fedora 16:

这是我在Fedora 16上的输出:

[user@fedora:~]# python --version
Python 2.7.3
[user@fedora:~]# cat /etc/fedora-release 
Fedora release 16 (Verne)
[user@fedora:~]# python test_hash.py
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2
86f43abba8d46fe17cfb6dd1a986b1d086e81a5ad6e3d5c518dc10e8c7229dc366a4129275a609cda7b87cf0afaf82295421098159327e46b7541e2063341eb2

Notice the issue seems limited to Fedora 13 or Python 2.6. What would be causing this?

注意,这个问题似乎仅限于Fedora 13或Python 2.6。这会导致什么呢?

1 个解决方案

#1


2  

I don't understand what you're trying to do with unicode here, why you're using readlines() with a binary file, or even whether or not your file will have been flushed when you read it (you open a file and write to it, but you neither close nor flush it before you read from it). None of your hashes are right, or so it seems to me.

我不明白你想做什么与unicode,你为什么使用readline()与一个二进制文件,或者是否你的文件将会被刷新当你读它(你打开一个文件和写,但你既不近也不冲你读它之前)。在我看来,你的任何东西都不是对的。

~/coding$ sha512sum image.png 
fc704b0cbc532793ba727df9520e8046adf079a80faaffc2d4741ea61a7a506cb79e31ee4338c17d0acf77a9290a969ac63c0e672be09e5c8fc4dcd32c8b62ee  image.png
~/coding$ python
Python 2.7.3 (default, Aug  1 2012, 05:16:07) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> hashlib.sha512(open("image.png").read()).hexdigest()
'fc704b0cbc532793ba727df9520e8046adf079a80faaffc2d4741ea61a7a506cb79e31ee4338c17d0acf77a9290a969ac63c0e672be09e5c8fc4dcd32c8b62ee'

This even works in 2.5.6:

这甚至在2.5.6:

Python 2.5.6 (r256:88840, Jul 12 2012, 12:21:58) 
[GCC 4.6.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> hashlib.sha512(open("image.png").read()).hexdigest()
'fc704b0cbc532793ba727df9520e8046adf079a80faaffc2d4741ea61a7a506cb79e31ee4338c17d0acf77a9290a969ac63c0e672be09e5c8fc4dcd32c8b62ee'

If you compare even the first few bytes of image_content with what you're actually encoding in (say) get_text_hash, you see the difference:

如果你将image_content的前几个字节与你在(say) get_text_hash中编码的内容进行比较,你会看到区别:

>>> image_content[:6]
'\x89PNG\r\n'

Now you interpret this as a unicode string and throw away anything you don't understand:

现在你把它解释为unicode字符串,扔掉你不懂的东西:

>>> unicode(image_content[:6], errors='replace', encoding='utf-8')
u'\ufffdPNG\r\n'

And then you encode this again:

然后你再编码一次:

>>> unicode(image_content[:6], errors='replace', encoding='utf-8').encode('utf-8')
'\xef\xbf\xbdPNG\r\n'

It's this last that you're taking the hash of, which is obviously a different bytestring than the original.

这是最后一个你要处理的哈希,它显然是一个不同的bytestring而不是原来的。

To address the particulars of the question (the above was simply too long for a comment): as @gnibbler noted immediately, it's far more likely that the final strings you're asking the hash of vary. Simply put in a few print statements to compare what the different versions are doing, and you'll find the problem quickly. (My guess is a different result of the unicode encoding, for what it's worth.)

为了解决这个问题的细节(上面的注释太长了):就像@gnibbler马上指出的那样,你问的最终字符串可能会有变化。只需简单地放入一些打印语句来比较不同版本所做的工作,您就会很快发现问题。(我的猜测是unicode编码的不同结果,因为它的价值。)

#1


2  

I don't understand what you're trying to do with unicode here, why you're using readlines() with a binary file, or even whether or not your file will have been flushed when you read it (you open a file and write to it, but you neither close nor flush it before you read from it). None of your hashes are right, or so it seems to me.

我不明白你想做什么与unicode,你为什么使用readline()与一个二进制文件,或者是否你的文件将会被刷新当你读它(你打开一个文件和写,但你既不近也不冲你读它之前)。在我看来,你的任何东西都不是对的。

~/coding$ sha512sum image.png 
fc704b0cbc532793ba727df9520e8046adf079a80faaffc2d4741ea61a7a506cb79e31ee4338c17d0acf77a9290a969ac63c0e672be09e5c8fc4dcd32c8b62ee  image.png
~/coding$ python
Python 2.7.3 (default, Aug  1 2012, 05:16:07) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> hashlib.sha512(open("image.png").read()).hexdigest()
'fc704b0cbc532793ba727df9520e8046adf079a80faaffc2d4741ea61a7a506cb79e31ee4338c17d0acf77a9290a969ac63c0e672be09e5c8fc4dcd32c8b62ee'

This even works in 2.5.6:

这甚至在2.5.6:

Python 2.5.6 (r256:88840, Jul 12 2012, 12:21:58) 
[GCC 4.6.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> hashlib.sha512(open("image.png").read()).hexdigest()
'fc704b0cbc532793ba727df9520e8046adf079a80faaffc2d4741ea61a7a506cb79e31ee4338c17d0acf77a9290a969ac63c0e672be09e5c8fc4dcd32c8b62ee'

If you compare even the first few bytes of image_content with what you're actually encoding in (say) get_text_hash, you see the difference:

如果你将image_content的前几个字节与你在(say) get_text_hash中编码的内容进行比较,你会看到区别:

>>> image_content[:6]
'\x89PNG\r\n'

Now you interpret this as a unicode string and throw away anything you don't understand:

现在你把它解释为unicode字符串,扔掉你不懂的东西:

>>> unicode(image_content[:6], errors='replace', encoding='utf-8')
u'\ufffdPNG\r\n'

And then you encode this again:

然后你再编码一次:

>>> unicode(image_content[:6], errors='replace', encoding='utf-8').encode('utf-8')
'\xef\xbf\xbdPNG\r\n'

It's this last that you're taking the hash of, which is obviously a different bytestring than the original.

这是最后一个你要处理的哈希,它显然是一个不同的bytestring而不是原来的。

To address the particulars of the question (the above was simply too long for a comment): as @gnibbler noted immediately, it's far more likely that the final strings you're asking the hash of vary. Simply put in a few print statements to compare what the different versions are doing, and you'll find the problem quickly. (My guess is a different result of the unicode encoding, for what it's worth.)

为了解决这个问题的细节(上面的注释太长了):就像@gnibbler马上指出的那样,你问的最终字符串可能会有变化。只需简单地放入一些打印语句来比较不同版本所做的工作,您就会很快发现问题。(我的猜测是unicode编码的不同结果,因为它的价值。)