无论大小如何,可以使用什么算法来识别图像是“相同”还是相似?

时间:2022-01-14 21:28:58

TinEye, the "reverse image search engine", allows you to upload/link to an image and it is able to search through the billion images it has crawled and it will return links to images it has found that are the same image.

TinEye,“反向图像搜索引擎”,允许您上传/链接到图像,它能够搜索它已爬行的十亿个图像,它将返回到找到的图像是同一图像的链接。

However, it isn't a naive checksum or anything related to that. It is often able to find both images of a higher resolution and lower resolution and larger and smaller size than the original image you supply. This is a good use for the service because I often find an image and want the highest resolution version of it possible.

但是,它不是一个天真的校验和或任何与此相关的东西。它通常能够找到比您提供的原始图像更高分辨率和更低分辨率以及更大和更小尺寸的图像。这是一个很好的服务用途,因为我经常找到一个图像,并希望它的最高分辨率版本。

Not only that, but I've had it find images of the same image set, where the people in the image are in a different position but the background largely stays the same.

不仅如此,我已经找到了相同图像集的图像,图像中的人处于不同的位置,但背景基本保持不变。

What type of algorithm could TinEye be using that would allow it to compare an image with others of various sizes and compression ratios and yet still accurately figure out that they are the "same" image or set?

TinEye可以使用哪种类型的算法,可以将图像与各种尺寸和压缩比的图像进行比较,但仍能准确地发现它们是“相同”的图像或集合?

8 个解决方案

#1


These algorithms are usually fingerprint-based. Fingerprint is a reasonably small data structure, something like a long hash code. However, the goals of fingerprint function are opposite to the goals of hash function. A good hash function should generate very different codes for very similar (but not equal) objects. The fingerprint function should, on contrary, generate the same fingerprint for similar images.

这些算法通常基于指纹。指纹是一种相当小的数据结构,类似于长哈希码。但是,指纹功能的目标与哈希函数的目标相反。一个好的哈希函数应该为非常相似(但不相等)的对象生成非常不同的代码。相反,指纹功能应该为类似图像生成相同的指纹。

Just to give you an example, this is a (not particularly good) fingerprint function: resize the picture to 32x32 square, normalize and and quantize the colors, reducing the number of colors to something like 256. Then, you have 1024-byte fingerprint for the image. Just keep a table of fingerprint => [list of image URLs]. When you need to look images similar to a given image, just calculate its fingerprint value and find the corresponding image list. Easy.

举个例子,这是一个(不是特别好)的指纹功能:将图片大小调整为32x32平方,对颜色进行标准化和量化,将颜色数量减少到256个。然后,你有1024字节的指纹对于图像。只需保留一个指纹表=> [图像URL列表]。当您需要查看与给定图像类似的图像时,只需计算其指纹值并找到相应的图像列表。简单。

What is not easy - to be useful in practice, the fingerprint function needs to be robust against crops, affine transforms, contrast changes, etc. Construction of good fingerprint functions is a separate research topic. Quite often they are hand-tuned and uses a lot of heuristics (i.e. use the knowledge about typical photo contents, about image format / additional data in EXIF, etc.)

什么是不容易的 - 在实践中有用,指纹功能需要对作物,仿射变换,对比度变化等具有强大的功能。良好指纹功能的构建是一个单独的研究课题。通常他们是手动调整并使用大量启发式(即使用关于典型照片内容的知识,关于EXIF中的图像格式/附加数据等)

Another variation is to use more than one fingerprint function, try to apply each of them and combine the results. Actually, it's similar to finding similar texts. Just instead of "bag of words" the image similarity search uses a "bag of fingerprints" and finds how many elements from one bag are the same as elements from another bag. How to make this search efficient is another topic.

另一种变化是使用多个指纹功能,尝试应用它们并组合结果。实际上,它类似于找到类似的文本。图像相似性搜索使用“指纹袋”代替“文字袋”,并且发现一个袋子中的多少元素与另一个袋子中的元素相同。如何使这种搜索高效是另一个主题。

Now, regarding the articles/papers. I couldn't find a good article that would give an overview of different methods. Most of the public articles I know discuss specific improvement to specific methods. I could recommend to check these:

现在,关于文章/论文。我找不到一篇能够概述不同方法的好文章。我所知道的大多数公开文章都讨论了具体方法的具体改进。我可以建议检查这些:

"Content Fingerprinting Using Wavelets". This article is about audio fingerprinting using wavelets, but the same method can be adapted for image fingerprinting.

“使用小波的内容指纹识别”。本文是关于使用小波的音频指纹识别,但相同的方法可以适用于图像指纹识别。

PERMUTATION GROUPING: INTELLIGENT HASH FUNCTION DESIGN FOR AUDIO & IMAGE RETRIEVAL. Info on Locality-Sensitive Hashes.

置换分组:用于音频和图像检索的智能哈希功能设计。关于局部敏感哈希的信息。

Bundling Features for Large Scale Partial-Duplicate Web Image Search. A very good article, talks about SIFT and bundling features for efficiency. It also has a nice bibliography at the end

用于大规模部分重复Web图像搜索的捆绑功能。一篇非常好的文章,讨论了SIFT和捆绑功能以提高效率。它最后还有一个很好的参考书目

#2


It's probably based on improvements of feature extraction algorithms, taking advantage of features which are scale invariant.

它可能基于特征提取算法的改进,利用了规模不变的特征。

Take a look at

看一眼

or, if you are REALLY interested, you can shell out some 70 bucks (or at least look at the Google preview) for

或者,如果你真的感兴趣,你可以支付70美元(或至少看看谷歌预览版)

#3


The creator of the FotoForensics site posted this blog post on this topic, it was very useful to me, and showed algorithms that may be good enough for you and that require a lot less work than wavelets and feature extraction.

FotoForensics网站的创建者发布了关于这个主题的博客文章,它对我非常有用,并且展示了对你来说可能足够好的算法,并且需要比小波和特征提取少得多的工作。

http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html

aHash (also called Average Hash or Mean Hash). This approach crushes the image into a grayscale 8x8 image and sets the 64 bits in the hash based on whether the pixel's value is greater than the average color for the image.

aHash(也称为Average Hash或Mean Hash)。这种方法将图像压缩为灰度8x8图像,并根据像素的值是否大于图像的平均颜色来设置哈希值中的64位。

pHash (also called "Perceptive Hash"). This algorithm is similar to aHash but use a discrete cosine transform (DCT) and compares based on frequencies rather than color values.

pHash(也称为“Perceptive Hash”)。该算法类似于aHash,但使用离散余弦变换(DCT)并基于频率而不是颜色值进行比较。

dHash Like aHash and pHash, dHash is pretty simple to implement and is far more accurate than it has any right to be. As an implementation, dHash is nearly identical to aHash but it performs much better. While aHash focuses on average values and pHash evaluates frequency patterns, dHash tracks gradients.

dHash像aHash和pHash一样,dHash实现起来非常简单,并且比任何权利都要准确得多。作为一种实现,dHash几乎与aHash相同,但它的表现要好得多。虽然aHash侧重于平均值,而pHash评估频率模式,但dHash跟踪渐变。

#4


The Hough Transform is a very old feature extraction algorithm, that you mind find interesting. I doubt it's what tinyeye uses, but it's a good, simple starting place for learning about feature extraction.

Hough变换是一种非常古老的特征提取算法,您会发现它很有趣。我怀疑它是tinyeye使用的,但它是学习特征提取的一个好的,简单的起点。

There are also slides to a neat talk from some University of Toronto folks about their work at astrometry.net. They developed an algorithm for matching telescoping images of the night sky to locations in star catalogs in order to identify the features in the image. It's a more specific problem than what tinyeye tries to solve, but I'd expect that a lot of the basic ideas that they talk about are applicable to the more general problem.

一些来自多伦多大学的人谈论他们在astrometry.net的工作时,也有一些简短的演讲。他们开发了一种算法,用于将夜空的伸缩图像与星形目录中的位置相匹配,以识别图像中的特征。这是一个比tinyeye试图解决的更具体的问题,但我希望他们谈论的很多基本想法都适用于更普遍的问题。

#5


http://tineye.com/faq#how

Based on this, Igor Krivokon's answer seems to be on the mark.

基于此,Igor Krivokon的答案似乎就是标志。

#6


They may well be doing a Fourier Transform to characterize the complexity of the image, as well as a histogram to characterize the chromatic distribution, paired with a region categorization algorithm to assure that similarly complex and colored images don't get wrongly paired. Don't know if that's what they're using, but it seems like that would do the trick.

他们可能正在进行傅里叶变换以表征图像的复杂性,以及用于表征色度分布的直方图,与区域分类算法配对以确保类似的复杂和彩色图像不会错误地配对。不知道这是不是他们正在使用的东西,但看起来这样就可以了。

#7


Check out this blog post (not mine) for a very understandable description of a very understandable algorithm which seems to get good results for how simple it is. It basically partitions the respective pictures into a very coarse grid, sorts the grid by red:blue and green:blue ratios, and checks whether the sorts were the same. This naturally works for color images only.

查看这篇博文(不是我的博文),了解一个非常容易理解的算法,这个算法似乎可以很简单地得到很好的结果。它基本上将相应的图片划分为非常粗糙的网格,按红色:蓝色和绿色:蓝色比例对网格进行排序,并检查排序是否相同。这自然仅适用于彩色图像。

The pros most likely get better results using vastly more advanced algorithms. As mentioned in the comments on that blog, a leading approach seems to be wavelets.

使用更先进的算法,专业人士最有可能获得更好的结果。正如该博客评论中所提到的,一种领先的方法似乎是小波。

#8


What about resizing the pictures to a standard small size and checking for SSIM or luma-only PSNR values? that's what I would do.

如何将图片大小调整为标准小尺寸并检查SSIM或仅限亮度的PSNR值?这就是我要做的。

#1


These algorithms are usually fingerprint-based. Fingerprint is a reasonably small data structure, something like a long hash code. However, the goals of fingerprint function are opposite to the goals of hash function. A good hash function should generate very different codes for very similar (but not equal) objects. The fingerprint function should, on contrary, generate the same fingerprint for similar images.

这些算法通常基于指纹。指纹是一种相当小的数据结构,类似于长哈希码。但是,指纹功能的目标与哈希函数的目标相反。一个好的哈希函数应该为非常相似(但不相等)的对象生成非常不同的代码。相反,指纹功能应该为类似图像生成相同的指纹。

Just to give you an example, this is a (not particularly good) fingerprint function: resize the picture to 32x32 square, normalize and and quantize the colors, reducing the number of colors to something like 256. Then, you have 1024-byte fingerprint for the image. Just keep a table of fingerprint => [list of image URLs]. When you need to look images similar to a given image, just calculate its fingerprint value and find the corresponding image list. Easy.

举个例子,这是一个(不是特别好)的指纹功能:将图片大小调整为32x32平方,对颜色进行标准化和量化,将颜色数量减少到256个。然后,你有1024字节的指纹对于图像。只需保留一个指纹表=> [图像URL列表]。当您需要查看与给定图像类似的图像时,只需计算其指纹值并找到相应的图像列表。简单。

What is not easy - to be useful in practice, the fingerprint function needs to be robust against crops, affine transforms, contrast changes, etc. Construction of good fingerprint functions is a separate research topic. Quite often they are hand-tuned and uses a lot of heuristics (i.e. use the knowledge about typical photo contents, about image format / additional data in EXIF, etc.)

什么是不容易的 - 在实践中有用,指纹功能需要对作物,仿射变换,对比度变化等具有强大的功能。良好指纹功能的构建是一个单独的研究课题。通常他们是手动调整并使用大量启发式(即使用关于典型照片内容的知识,关于EXIF中的图像格式/附加数据等)

Another variation is to use more than one fingerprint function, try to apply each of them and combine the results. Actually, it's similar to finding similar texts. Just instead of "bag of words" the image similarity search uses a "bag of fingerprints" and finds how many elements from one bag are the same as elements from another bag. How to make this search efficient is another topic.

另一种变化是使用多个指纹功能,尝试应用它们并组合结果。实际上,它类似于找到类似的文本。图像相似性搜索使用“指纹袋”代替“文字袋”,并且发现一个袋子中的多少元素与另一个袋子中的元素相同。如何使这种搜索高效是另一个主题。

Now, regarding the articles/papers. I couldn't find a good article that would give an overview of different methods. Most of the public articles I know discuss specific improvement to specific methods. I could recommend to check these:

现在,关于文章/论文。我找不到一篇能够概述不同方法的好文章。我所知道的大多数公开文章都讨论了具体方法的具体改进。我可以建议检查这些:

"Content Fingerprinting Using Wavelets". This article is about audio fingerprinting using wavelets, but the same method can be adapted for image fingerprinting.

“使用小波的内容指纹识别”。本文是关于使用小波的音频指纹识别,但相同的方法可以适用于图像指纹识别。

PERMUTATION GROUPING: INTELLIGENT HASH FUNCTION DESIGN FOR AUDIO & IMAGE RETRIEVAL. Info on Locality-Sensitive Hashes.

置换分组:用于音频和图像检索的智能哈希功能设计。关于局部敏感哈希的信息。

Bundling Features for Large Scale Partial-Duplicate Web Image Search. A very good article, talks about SIFT and bundling features for efficiency. It also has a nice bibliography at the end

用于大规模部分重复Web图像搜索的捆绑功能。一篇非常好的文章,讨论了SIFT和捆绑功能以提高效率。它最后还有一个很好的参考书目

#2


It's probably based on improvements of feature extraction algorithms, taking advantage of features which are scale invariant.

它可能基于特征提取算法的改进,利用了规模不变的特征。

Take a look at

看一眼

or, if you are REALLY interested, you can shell out some 70 bucks (or at least look at the Google preview) for

或者,如果你真的感兴趣,你可以支付70美元(或至少看看谷歌预览版)

#3


The creator of the FotoForensics site posted this blog post on this topic, it was very useful to me, and showed algorithms that may be good enough for you and that require a lot less work than wavelets and feature extraction.

FotoForensics网站的创建者发布了关于这个主题的博客文章,它对我非常有用,并且展示了对你来说可能足够好的算法,并且需要比小波和特征提取少得多的工作。

http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html

aHash (also called Average Hash or Mean Hash). This approach crushes the image into a grayscale 8x8 image and sets the 64 bits in the hash based on whether the pixel's value is greater than the average color for the image.

aHash(也称为Average Hash或Mean Hash)。这种方法将图像压缩为灰度8x8图像,并根据像素的值是否大于图像的平均颜色来设置哈希值中的64位。

pHash (also called "Perceptive Hash"). This algorithm is similar to aHash but use a discrete cosine transform (DCT) and compares based on frequencies rather than color values.

pHash(也称为“Perceptive Hash”)。该算法类似于aHash,但使用离散余弦变换(DCT)并基于频率而不是颜色值进行比较。

dHash Like aHash and pHash, dHash is pretty simple to implement and is far more accurate than it has any right to be. As an implementation, dHash is nearly identical to aHash but it performs much better. While aHash focuses on average values and pHash evaluates frequency patterns, dHash tracks gradients.

dHash像aHash和pHash一样,dHash实现起来非常简单,并且比任何权利都要准确得多。作为一种实现,dHash几乎与aHash相同,但它的表现要好得多。虽然aHash侧重于平均值,而pHash评估频率模式,但dHash跟踪渐变。

#4


The Hough Transform is a very old feature extraction algorithm, that you mind find interesting. I doubt it's what tinyeye uses, but it's a good, simple starting place for learning about feature extraction.

Hough变换是一种非常古老的特征提取算法,您会发现它很有趣。我怀疑它是tinyeye使用的,但它是学习特征提取的一个好的,简单的起点。

There are also slides to a neat talk from some University of Toronto folks about their work at astrometry.net. They developed an algorithm for matching telescoping images of the night sky to locations in star catalogs in order to identify the features in the image. It's a more specific problem than what tinyeye tries to solve, but I'd expect that a lot of the basic ideas that they talk about are applicable to the more general problem.

一些来自多伦多大学的人谈论他们在astrometry.net的工作时,也有一些简短的演讲。他们开发了一种算法,用于将夜空的伸缩图像与星形目录中的位置相匹配,以识别图像中的特征。这是一个比tinyeye试图解决的更具体的问题,但我希望他们谈论的很多基本想法都适用于更普遍的问题。

#5


http://tineye.com/faq#how

Based on this, Igor Krivokon's answer seems to be on the mark.

基于此,Igor Krivokon的答案似乎就是标志。

#6


They may well be doing a Fourier Transform to characterize the complexity of the image, as well as a histogram to characterize the chromatic distribution, paired with a region categorization algorithm to assure that similarly complex and colored images don't get wrongly paired. Don't know if that's what they're using, but it seems like that would do the trick.

他们可能正在进行傅里叶变换以表征图像的复杂性,以及用于表征色度分布的直方图,与区域分类算法配对以确保类似的复杂和彩色图像不会错误地配对。不知道这是不是他们正在使用的东西,但看起来这样就可以了。

#7


Check out this blog post (not mine) for a very understandable description of a very understandable algorithm which seems to get good results for how simple it is. It basically partitions the respective pictures into a very coarse grid, sorts the grid by red:blue and green:blue ratios, and checks whether the sorts were the same. This naturally works for color images only.

查看这篇博文(不是我的博文),了解一个非常容易理解的算法,这个算法似乎可以很简单地得到很好的结果。它基本上将相应的图片划分为非常粗糙的网格,按红色:蓝色和绿色:蓝色比例对网格进行排序,并检查排序是否相同。这自然仅适用于彩色图像。

The pros most likely get better results using vastly more advanced algorithms. As mentioned in the comments on that blog, a leading approach seems to be wavelets.

使用更先进的算法,专业人士最有可能获得更好的结果。正如该博客评论中所提到的,一种领先的方法似乎是小波。

#8


What about resizing the pictures to a standard small size and checking for SSIM or luma-only PSNR values? that's what I would do.

如何将图片大小调整为标准小尺寸并检查SSIM或仅限亮度的PSNR值?这就是我要做的。