快速计算汉明距离

时间:2021-10-01 19:14:54

I read the Wikipedia article on Hamming Weight and noticed something interesting:

我读了*关于汉明体重的文章,发现了一些有趣的事情:

It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string. In this binary case, it is also called the population count, popcount or sideways sum.

因此,它等价于与相同长度的全零弦的汉明距离。对于最典型的情况,一串比特,这是字符串中1的个数。在这种二进制情况下,它也被称为人口计数、popcount或横向和。

[emphasis mine]

(强调我的)

So something occurred to me. Could I calculate the Hamming Distance between two strings by XORing them and then taking the Hamming Weight (POPCOUNT) of the resulting string?

我突然想起来了。我可以通过XORing它们然后取结果字符串的Hamming Weight (POPCOUNT)来计算两个字符串之间的汉明距离吗?

Something along the lines of this (using gcc intrinsics):

有一些类似的东西(使用gcc intrinsic):

#include <stdint.h>

int hammingDistance (uint64_t x, uint64_t y) {
        uint64_t res = x ^ y;
        return __builtin_popcountll (res);
}

Now, as for why I would want to do this, well, on some platforms, yes, this would just translate to gcc emitting a call to a function that calculates popcount. For instance, on x64 without popcnt, gcc spits out (Godbolt's GCC Online):

至于我为什么要这么做,嗯,在某些平台上,是的,这将转化为对计算popcount的函数发出调用的gcc。例如,在没有popcnt的x64上,gcc会吐出(Godbolt的gcc在线):

hammingDistance:
    sub rsp, 8
    xor rdi, rsi
    call    __popcountdi2
    add rsp, 8
    ret

OTOH, if you have a platform that supports POPCOUNT, like x64 models including nehalem and after (which have POPCNT), you get (Godbolt's GCC Online):

OTOH,如果你有一个支持POPCOUNT的平台,比如x64模型,包括nehalem和after(有POPCNT),你会得到(Godbolt的GCC在线):

hammingDistance:
    xor rdi, rsi
    popcnt  rax, rdi
    ret

which should be waaay faster, especially once inlined.

这应该更快,特别是在内衬。


But back to the original question. Can you take the Hamming Weight of the XOR of two strings to find their Hamming Distance? ie:

回到最初的问题。你能取两个弦的x的汉明重量来求它们的汉明距离吗?即:

HD = HW (x xor y)

2 个解决方案

#1


5  

Hamming distance between two equal length strings, x and y, is defined to be the number of positions where they differ. In the case of x and y being bitstrings, x^y is a string with 1s in exactly the positions they differ. Thus, HammingDistance(x,y) = Number of 1s in x^y, for bitstrings. Also, HammingWeight(x) = number of 1s in x for a bitstring x. Thus, your first claim, HammingDistance(x,y) = HammingWeight(x^y) is true for bitstrings. Having established that, it is clear that your implementation is correct.

两个长度相等的弦,x和y之间的汉明距离,定义为它们不同的位置数。在x和y的位串中,x ^ y是一个字符串1 s的职位他们不同。因此,HammingDistance(x,y)= x ^ y的1 s,位串。此外,HammingWeight(x)= x的一个位串的1 s x。因此,你的第一个要求,HammingDistance(x,y)= HammingWeight(x ^ y)位串。在确定了这一点之后,很明显您的实现是正确的。

#2


3  

Yes, that works. For each bit, the bit is 1 if and only if the input bits are different. Hence, applied to a whole bit vector, the result has as many one bits (HW) as the inputs have differing bits (HD). And your code seems to exploit that relationship perfectly well. In fact, this shortcut is even mentioned further into the Hamming weight article you link to (Efficient implementation):

是的,工作。对于每个位,当且仅当输入位不同时,位为1。因此,应用于整个位向量,结果具有与输入具有不同位(HD)相同的多个位(HW)。你的代码似乎很好地利用了这种关系。事实上,在你链接到的汉明权重文章(高效实现)中,甚至还提到了这个快捷方式:

The Hamming distance of two words A and B can be calculated as the Hamming weight of A xor B.

两个单词A和B的汉明距离可以计算为A或B的汉明重量。

#1


5  

Hamming distance between two equal length strings, x and y, is defined to be the number of positions where they differ. In the case of x and y being bitstrings, x^y is a string with 1s in exactly the positions they differ. Thus, HammingDistance(x,y) = Number of 1s in x^y, for bitstrings. Also, HammingWeight(x) = number of 1s in x for a bitstring x. Thus, your first claim, HammingDistance(x,y) = HammingWeight(x^y) is true for bitstrings. Having established that, it is clear that your implementation is correct.

两个长度相等的弦,x和y之间的汉明距离,定义为它们不同的位置数。在x和y的位串中,x ^ y是一个字符串1 s的职位他们不同。因此,HammingDistance(x,y)= x ^ y的1 s,位串。此外,HammingWeight(x)= x的一个位串的1 s x。因此,你的第一个要求,HammingDistance(x,y)= HammingWeight(x ^ y)位串。在确定了这一点之后,很明显您的实现是正确的。

#2


3  

Yes, that works. For each bit, the bit is 1 if and only if the input bits are different. Hence, applied to a whole bit vector, the result has as many one bits (HW) as the inputs have differing bits (HD). And your code seems to exploit that relationship perfectly well. In fact, this shortcut is even mentioned further into the Hamming weight article you link to (Efficient implementation):

是的,工作。对于每个位,当且仅当输入位不同时,位为1。因此,应用于整个位向量,结果具有与输入具有不同位(HD)相同的多个位(HW)。你的代码似乎很好地利用了这种关系。事实上,在你链接到的汉明权重文章(高效实现)中,甚至还提到了这个快捷方式:

The Hamming distance of two words A and B can be calculated as the Hamming weight of A xor B.

两个单词A和B的汉明距离可以计算为A或B的汉明重量。