mysql汉明距离两个phash。

时间:2022-01-02 19:14:53

I have a table A which has a column 'template_phash'. I store the phash generated from 400K images.

我有一个表a,它有一个列'template_phash'。我存储从400K图像生成的phash。

Now I take a random image and generate a phash from that image.

现在我取一个随机图像并从图像中生成一个phash。

Now how do I query so that I can get the record from table A which hamming distance difference is less than a threshold value, say 20.

现在我如何查询,这样我就能从表A中获得记录,汉明距离差小于阈值,比如20。

I have seen Hamming distance on binary strings in SQL, but couldn't figure it out.

我已经在SQL中看到了二进制字符串的汉明距离,但我不知道。

I think I figured out that I need to make a function to achieve this but how?

我想我已经知道我需要做一个函数来实现这个,但是怎么做呢?

Both of my phash are in BigInt eg: 7641692061273169067

我的两个phash都在BigInt eg: 7641692061273169067。

Please help me make the function so that I could query like

请帮我做一下这个功能,这样我就可以查询了。

SELECT product_id, HAMMING_DISTANCE(phash1,  phash2) as hd 
FROM A 
WHERE hd < 20 ORDER BY hd ASC;

1 个解决方案

#1


21  

I figured out that the hamming distance is just the count of different bits between the two hashes. First xor the two hashes then get the count of binary ones:

我算出了汉明距离只是两个哈希之间不同比特数的计数。第一个xor和两个散列得到二进制数的计数:

SELECT product_id, BIT_COUNT(phash1 ^ phash2) as hd from A ORDER BY hd ASC;

#1


21  

I figured out that the hamming distance is just the count of different bits between the two hashes. First xor the two hashes then get the count of binary ones:

我算出了汉明距离只是两个哈希之间不同比特数的计数。第一个xor和两个散列得到二进制数的计数:

SELECT product_id, BIT_COUNT(phash1 ^ phash2) as hd from A ORDER BY hd ASC;