使用图像而不是通过标签从数据库中查找相似的图像

时间:2021-07-29 00:20:55

Well the question is simple i want to find similar images given a query image, similar to what TinEye does. Suppose I have a shirt with the following description

那么问题很简单我想找到一个给出查询图像的类似图像,类似于TinEye所做的。假设我有一件带有以下描述的衬衫

Sleeve length : full

袖长:满

collar : present

领子:礼物

pattern : striped

图案:条纹

(The above data is just to give you a feel of image i actually dont have this data)

(以上数据只是为了让您感觉图像我实际上没有这些数据)

使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像

First image is the query image and the next should be the output of the similarity finding algorithm. So based on the example we have a flexibility like we can show the user an image with a changed color, we can see all the images have the same pattern, the same collar type or sleeve length. So i have to show the output which are visually similar.

第一个图像是查询图像,下一个图像是相似性查找算法的输出。因此,基于该示例,我们具有灵活性,例如我们可以向用户显示具有改变颜色的图像,我们可以看到所有图像具有相同的图案,相同的领型或袖长。所以我必须显示视觉上相似的输出。

There are similar thread on stack also link from stack and not only this but there are many other. But i am confused about the approach to follow.

堆栈上有类似的线程也从堆栈链接,不仅如此,还有很多其他。但我对这种方法感到困惑。

In my case i dont have to search in another category I have to search in the same category like if the input is shirt i will search in the shirt category only. That part has been done.

在我的情况下,我不必搜索另一个类别,我必须搜索相同的类别,如果输入是衬衫我将只在衬衫类别中搜索。那部分已经完成。

So the question is what are the approaches to handle this problem. for the color it is no big issue. Color information can be easily extracted through color histogram. Lets say the input is TShirt round neck i.e. without collar, half sleeve and printed at center with text. Now the output should be images similar to those like half sleeve, round collar, and printed text at center. thought the text may vary. I tried K-Means clustering and P-hash but that didnt work. Please enlighten me

所以问题是处理这个问题的方法是什么。对于颜色来说这不是什么大问题。可以通过颜色直方图轻松提取颜色信息。让我们说输入是TShirt圆领,即没有领,半袖,并在文本中心打印。现在输出应该是类似半袖,圆领和中心印刷文字的图像。认为文字可能会有所不同。我尝试了K-Means聚类和P-hash但是没有用。请赐教

PS : I have to find similar images not duplicates.

PS:我必须找到类似的图像不重复。

1 个解决方案

#1


I would try to split this problem into 3 smaller problems:

我会尝试将此问题分解为3个较小的问题:

  • checking whether image shows shirt with long or short sleevs
  • 检查图像是否显示有长袖或短袖的衬衫

  • checking pattern (stipped, plain, something else?)
  • 检查模式(规定,平原,其他?)

  • determining color of shirt
  • 确定衬衫的颜色


Checking whether image shows shirt with long or short sleevs
This one is in my opinion the easiest. You mentioned that you have category name, but basing on google graphics it seems that it may not be obvious whether Shirt or TShirt has long or short sleevs.
My solution is quite simple:

检查图像是否显示长袖或短袖衬衫这个在我看来是最简单的。你提到你有类别名称,但基于谷歌图形似乎不是很明显衬衫或TShirt是长袖还是短袖。我的解决方案非常简单:

  • Find face on image
  • 在图像上找到面孔

  • Use grabcut algorithm to extract face mask from image
  • 使用抓取算法从图像中提取面罩

  • Mask face (so after this step only face is left - everythin else is black). Note that this step is not necessary - i've mentioned it only, because it's shown on final image.
  • 面具面(所以在这一步之后只剩下面部 - 其他的都是黑色)。请注意,此步骤不是必需的 - 我只提到它,因为它显示在最终图像上。

  • Convert image to HSV color space
  • 将图像转换为HSV颜色空间

  • Using face mask calculate histogram for H and S color channels of FACE ONLY (without rest of the image)
  • 使用面罩仅计算FACE的H和S颜色通道的直方图(无图像的其余部分)

  • Calculate back projection of hsv image using histogram from previous step. Thanks for that you will get only regions which color (in HSV) is similar to color of face - so you will get only regions which contains skin.
  • 使用上一步骤中的直方图计算hsv图像的反投影。谢谢你,你将只获得颜色(HSV)与脸部颜色相似的区域 - 所以你只能得到包含皮肤的区域。

  • Threshold the result (there is always some noise :) )
  • 阈值结果(总有一些噪音:))

The final result of this algorithm is black and white image which shows skin regions. Using this image you can calculate number of pixels with skin and check whether skin is only on face or maybe somewhere else. You can try to find contours as well - generally both solututions will give chance to check if hands are visible. Yes - shirt has short sleevs, no - long sleevs.

该算法的最终结果是显示皮肤区域的黑白图像。使用此图像,您可以计算皮肤像素数,并检查皮肤是仅在脸上还是在其他地方。您也可以尝试找到轮廓 - 通常两个解决方案都有机会检查手是否可见。是的 - 衬衫有短袖,没有长袖。

Here are the results (from top-left corner - original image, face mask (result of grabcut algorithm), masked face, hsv image, result of calculating back projection, result of using threshold on previous one): 使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像
As you can see, unfortunetely it fails for image 3, because face is in very similar color to shirt pattern (and generally face color is quite close to white - something is wrong with this guy, he should spend more time outside ;) ).

以下是结果(从左上角 - 原始图像,面罩(抓取算法的结果),蒙面,hsv图像,计算背投影的结果,使用前一个阈值的结果):如你所见,不幸的是它不适用于图像3,​​因为脸部的颜色与衬衫图案的颜色非常相似(通常脸颜色非常接近白色 - 这个家伙出了点问题,他应该花更多时间在外面;))。

Source is quite simple, but if you don't understand something feel free to ask:

来源非常简单,但如果你不明白,可以随意问:

import cv2
import numpy as np


def process_image(img, face_pos, title):
    if len(face_pos) == 0:
        print 'No face found!'
        return
    mask = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8) #create mask with the same size as image, but only one channel. Mask is initialized with zeros
    cv2.grabCut(img, mask, tuple(face_pos[0]), np.zeros((1,65), dtype=np.float64), np.zeros((1,65), dtype=np.float64), 1, cv2.GC_INIT_WITH_RECT) #use grabcut algorithm to find mask of face. See grabcut description for more details (it's quite complicated algorithm)
    mask = np.where((mask==1) + (mask==3), 255, 0).astype('uint8') #set all pixels == 1 or == 3 to 255, other pixels set to 0
    img_masked = cv2.bitwise_and(img, img, mask=mask) #create masked image - just to show the result of grabcut
    #show images
    cv2.imshow(title, mask) 
    cv2.imshow(title+' masked', img_masked)

    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) #convert image to hsv
    channels = [0,1]
    channels_ranges = [180, 256]
    channels_values = [0, 180, 0, 256]
    histogram = cv2.calcHist([img_hsv], channels, mask, channels_ranges, channels_values) #calculate histogram of H and S channels
    histogram = cv2.normalize(histogram, None, 0, 255, cv2.NORM_MINMAX) #normalize histogram

    dst = cv2.calcBackProject([img_hsv], channels, histogram, channels_values, 1) # calculate back project (find all pixels with color similar to color of face)
    cv2.imshow(title + ' calcBackProject raw result', dst)

    ret, thresholded = cv2.threshold(dst, 25, 255, cv2.THRESH_BINARY) #threshold result of previous step (remove noise etc)
    cv2.imshow(title + ' thresholded', thresholded)

    cv2.waitKey(5000)
    #put partial results into one final image
    row1 = np.hstack((img, cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR), img_masked))
    row2 = np.hstack((img_hsv, cv2.cvtColor(dst, cv2.COLOR_GRAY2BGR), cv2.cvtColor(thresholded, cv2.COLOR_GRAY2BGR)))
    return np.vstack((row1, row2))


paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg']
haar_cascade = cv2.CascadeClassifier('C:\\DevTools\\src\\opencv\\data\\haarcascades\\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder

for path in paths:
    img = cv2.imread(path)
    face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
        face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    result = process_image(img, face_pos, path)
    cv2.imwrite('result_' + path, result) #save the result

Checking pattern (stipped, plain, something else?) and determining color of shirt
Here i would try to extract (mask) the shirt from the image and than operate only on it. To achieve it i would try to use similar approach as in previous part - grabcut algorithm. Initializing it might be harder this time. Quite easy (but probably not perfect) solution which comes to my mind is:

检查图案(规定,平原,别的?)和确定衬衫的颜色在这里,我会尝试从图像中提取(掩盖)衬衫,而不是只在它上面操作。为了实现它,我将尝试使用与前一部分类似的方法 - 抓取算法。这次初始化可能会更难。我想到的非常简单(但可能不完美)的解决方案是:

  • set rect around almost whole area (leave just few pixels on each side)
  • 在几乎整个区域周围设置矩形(每边只留几个像素)

  • initialize mask to sure foreground value in the middle of the image - just draw some circle in the middle using sure foregroung "color"
  • 将遮罩初始化为图像中间的前景值 - 只需在中间绘制一些圆圈,使用“foregroung”颜色

  • set mask to sure background in the corners of the image
  • 将遮罩设置为图像角落的确定背景

  • set mask to sure background in face rectangle (the one founded using Haar cascade in step "Checking whether image shows shirt with long or short sleevs")
  • 将面具设置为面部矩形中的确定背景(在步骤中使用Haar级联创建的那个“检查图像是否显示具有长袖或短袖的衬衫”)

Alternatively you can initialize whole mask as sure foreground or possible foreground and use watershed algorithm to find the big white area (which is backgrund). Once you have this area - use it as background.
Most likely using those 2 solutions together will give you the best results.

或者,您可以将整个蒙版初始化为可靠的前景或可能的前景,并使用分水岭算法来查找大的白色区域(即背景)。一旦你有这个区域 - 用它作为背景。最有可能同时使用这两种解决方案将为您带来最佳效果。

You can try much easier solution as well. It looks like all images has got SHIRT not background, skin or anything else sligtly over the center of it. Just like here: 使用图像而不是通过标签从数据库中查找相似的图像 so you can just analyze only this part of shirt. You can try to localize this sure shirt part of image using Haar cascade as well - just find face and than move the founded rectangle down.

您也可以尝试更简单的解决方案。看起来所有的图像都是衬衫,而不是背景,皮肤或其他任何东西都在它的中心。就像这里:所以你只能分析这部分衬衫。您可以尝试使用Haar级联来定位这个可靠的衬衫部分图像 - 只需找到面部,然后向下移动已建立的矩形。

Once you have masked shirt you can calculate its parameters. 2 things which i would try are:

一旦你有蒙面衬衫,你可以计算它的参数。我会尝试的两件事是:

  • convert it to HSV color space and calculate histograms (2 separates - not one as we did in previous step) for Hue and for Saturations channels. Comparing those histograms for 2 shirts should give you chance to find shirt with similar colors. For comparing histogram i would use some (normalized) correlation coefficients.
  • 将其转换为HSV颜色空间并计算Hue和饱和度通道的直方图(2个分离 - 不是我们在上一步中所做的那个)。比较2件衬衫的直方图应该有机会找到颜色相近的衬衫。为了比较直方图,我将使用一些(标准化的)相关系数。

  • use Fourier transform to see what frequencies are the most common in this shirt. For plain shirts it should be much smaller frequencies than for stripped.
  • 使用傅立叶变换来查看这件衬衫中最常见的频率。对于普通衬衫,它应该比剥离的频率小得多。

I know that those solutons aren't perfect, but hope it helps. If you will have any problems or questions - feel free to ask.

我知道那些解决方案并不完美,但希望它有所帮助。如果您有任何问题或疑问 - 请随时提出。


//edit:
I've done some simple pattern comparision using Fourier transform. Results are... not very good, not very bad - better than nothing, but definitely not perfect ;) I would say it's good point to start.
Package with code and images (yours + some from google) is here. Code:

//编辑:我使用傅立叶变换完成了一些简单的模式比较。结果是......不是很好,也不是很糟糕 - 比什么都没有好,但绝对不是完美的;)我想说这是开始的好点。带有代码和图片的包(你的+谷歌的一些)就在这里。码:

import cv2
import numpy as np
from collections import OrderedDict
import operator


def shirt_fft(img, face_pos, title):
    shirt_rect_pos = face_pos[0]
    # print shirt_rect_pos
    shirt_rect_pos[1] += 2*shirt_rect_pos[3] #move down (by 2 * its height) rectangle with face - now it will point shirt sample
    shirt_sample = img[shirt_rect_pos[1]:shirt_rect_pos[1]+shirt_rect_pos[3], shirt_rect_pos[0]:shirt_rect_pos[0]+shirt_rect_pos[2]].copy() #crop shirt sample from image
    shirt_sample = cv2.resize(shirt_sample, dsize=(256, 256)) #resize sample to (256,256)
    # cv2.imshow(title+' shirt sample', shirt_sample)

    shirt_sample_gray = cv2.cvtColor(shirt_sample, cv2.COLOR_BGR2GRAY) #convert to gray colorspace

    f = np.fft.fft2(shirt_sample_gray) #calculate fft
    fshift = np.fft.fftshift(f) #shift - now the brightest poitn will be in the middle
    # fshift = fshift.astype(np.float32)
    magnitude_spectrum = 20*np.log(np.abs(fshift)) # calculate magnitude spectrum (it's easier to show)
    print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
    magnitude_spectrum = cv2.normalize(magnitude_spectrum, alpha=255.0, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1) #normalize the result and convert to 8uc1 (1 channe with 8 bits - unsigned char) datatype
    print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
    # cv2.imshow(title+' fft magnitude', magnitude_spectrum)
    magnitude_spectrum_original = magnitude_spectrum.copy()
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, magnitude_spectrum.max()*0.75, 255.0, cv2.THRESH_TOZERO)
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 125, 255.0, cv2.THRESH_TOZERO)
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 250, 255.0, cv2.THRESH_TOZERO_INV) #clear the brightest part
    temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 200, 255.0, cv2.THRESH_TOZERO) #clear all values from 0 to 200 - removes noise etc
    # cv2.imshow(title+' fft magnitude thresholded', magnitude_spectrum)
    # cv2.waitKey(1)

    # if chr(cv2.waitKey(5000)) == 'q':
        # quit()

    # return fshift
    return shirt_sample_gray, magnitude_spectrum_original, magnitude_spectrum

paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg']
haar_cascade = cv2.CascadeClassifier('C:\\DevTools\\src\\opencv\\data\\haarcascades\\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder

fft_dict = OrderedDict()
results_img = None

for path in paths:
    img = cv2.imread(path)
    face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
        face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    # result = process_image(img, face_pos, path)
    # cv2.imwrite('result_' + path, result) #save the result
    results = shirt_fft(img, face_pos, path)
    if results_img is None:
        results_img = np.hstack(results)
    else:
        results_img = np.vstack((results_img, np.hstack(results)))
    fft_dict[path] = results[2]

similarity_dict = {}
cv2.imshow('results_img', results_img)
cv2.waitKey(1)


#for each image calcualte value of correlation with each other image
for i in range(len(fft_dict.keys())):
    for j in range(i+1, len(fft_dict.keys())):
    # for j in range(i, len(fft_dict.keys())):
        key1, key2 = fft_dict.keys()[i], fft_dict.keys()[j]
        print 'pair: ', key1, key2 
        img1 = fft_dict[key1]
        img2 = fft_dict[key2].copy()
        # img2 = img2[10:246, 10:246]
        correlation = cv2.matchTemplate(img1, img2, cv2.TM_CCORR_NORMED)
        # correlation = cv2.matchTemplate(img1, img2, cv2.TM_SQDIFF_NORMED)
        # print correlation
        print correlation.shape, correlation.dtype, correlation.max()
        similarity_dict[key1 + ' - ' + key2] = correlation.max()
        # similarity_dict[key1 + ' - ' + key2] = correlation

#sort values (from best to worst matches)
sorted_similarity_dict = sorted(similarity_dict.items(), key=operator.itemgetter(1), reverse=True)
print "final result: "
for a in sorted_similarity_dict:
    print a


cv2.waitKey(50000)

Some lines are commented - you can try to use them, maybe you will achieve better results.
Basic algorithm is quite simple - for each image:

有些行被注释 - 你可以尝试使用它们,也许你会获得更好的结果。基本算法非常简单 - 对于每个图像:

  • cut shirt sample from image (just move down rectangle with face by 2* it height)
  • 从图像中剪下衬衫样品(只需将面朝下的矩形向下移动2 *高度)

  • convert this rect to gray colorspace and resize to (256, 256)
  • 将此矩形转换为灰色空间并调整为(256,256)

  • calculate fft of this sample
  • 计算此样本的fft

  • calculate magnite spectrum of fft transform
  • 计算fft变换的magnite谱

  • normalize it (from 0 to 255)
  • 将其标准化(从0到255)

  • threshold it (clear all values <200) - this will remove noise etc.
  • 阈值(清除所有值<200) - 这将消除噪音等

Now we can calculate normalized cross corelation of this image between all shirt samples. High result -> similar samples. Final results:

现在我们可以计算所有衬衫样本之间该图像的标准化交叉核心。结果很高 - >类似的样品。最终结果:

('plain1.jpg - plain3.jpg', 1.0)  
('plain3.jpg - plain4.jpg', 1.0)  
('plain1.jpg - plain4.jpg', 1.0)  
('stripes1.jpg - stripes2.jpg', 0.54650664)  
('1.jpg - 3.jpg', 0.52512592)  
('plain1.jpg - stripes1.jpg', 0.45395589)  
('plain3.jpg - stripes1.jpg', 0.45395589)  
('plain4.jpg - stripes1.jpg', 0.45395589)  
('plain1.jpg - plain2.jpg', 0.39764369)  
('plain2.jpg - plain4.jpg', 0.39764369)  
('plain2.jpg - plain3.jpg', 0.39764369)  
('2.jpg - stripes1.jpg', 0.36927304)  
('2.jpg - plain3.jpg', 0.35678366)  
('2.jpg - plain4.jpg', 0.35678366)  
('2.jpg - plain1.jpg', 0.35678366)  
('1.jpg - plain1.jpg', 0.28958824)  
('1.jpg - plain3.jpg', 0.28958824)  
('1.jpg - plain4.jpg', 0.28958824)  
('2.jpg - 3.jpg', 0.27775836)  
('4.jpg - plain3.jpg', 0.2560707)  
('4.jpg - plain1.jpg', 0.2560707)  
('4.jpg - plain4.jpg', 0.2560707)  
('3.jpg - stripes1.jpg', 0.25498456)  
('4.jpg - plain2.jpg', 0.24522379)  
('1.jpg - 2.jpg', 0.2445447)  
('plain4.jpg - stripes2.jpg', 0.24032137)  
('plain3.jpg - stripes2.jpg', 0.24032137)  
('plain1.jpg - stripes2.jpg', 0.24032137)  
('3.jpg - stripes2.jpg', 0.23217434)  
('plain2.jpg - stripes2.jpg', 0.22518013)  
('2.jpg - stripes2.jpg', 0.19549081)  
('plain2.jpg - stripes1.jpg', 0.1805127)  
('3.jpg - plain4.jpg', 0.14908621)  
('3.jpg - plain1.jpg', 0.14908621)  
('3.jpg - plain3.jpg', 0.14908621)  
('4.jpg - stripes2.jpg', 0.14738286)  
('2.jpg - plain2.jpg', 0.14187276)  
('3.jpg - 4.jpg', 0.13638313)  
('1.jpg - stripes1.jpg', 0.13146029)  
('4.jpg - stripes1.jpg', 0.11624481)  
('1.jpg - plain2.jpg', 0.11515292)  
('2.jpg - 4.jpg', 0.091361843)  
('1.jpg - 4.jpg', 0.074155055)  
('1.jpg - stripes2.jpg', 0.069594234)  
('3.jpg - plain2.jpg', 0.059283193)  

Image with all the shirt samples, magnitude spectrums (before and after threshold) is here: 使用图像而不是通过标签从数据库中查找相似的图像

所有衬衫样本,幅度谱(阈值前后)的图像在这里:

The images names are (in the same order as samples on this big image): ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg'] As you can see, thresholded images are quite similar for samples with same pattern. I think that this solution could work better if you just find a better way to compare those images (thresholded magnitude spectrums).

图像名称(与此大图像上的样本的顺序相同):['1.jpg','2.jpg','3.jpg','4.jpg','plain1.jpg','plain2 .jpg','plain3.jpg','plain4.jpg','stripes1.jpg','stripes2.jpg']正如您所看到的,对于具有相同模式的样本,阈值图像非常相似。我认为如果你找到一种更好的方法来比较这些图像(阈值幅度谱),这个解决方案可以更好地工作。

edit2:
Just a simple idea - after you crop shirt samples from lot of shirts, you can try to train some classifier and than recognize patterns them using this classifier. Look for tutorials about training Haar or LBP(local binary pattern) cascades.

edit2:只是一个简单的想法 - 在你从很多衬衫裁剪衬衫样品后,你可以尝试训练一些分类器,然后使用这个分类器识别它们。寻找有关训练Haar或LBP(局部二进制模式)级联的教程。

#1


I would try to split this problem into 3 smaller problems:

我会尝试将此问题分解为3个较小的问题:

  • checking whether image shows shirt with long or short sleevs
  • 检查图像是否显示有长袖或短袖的衬衫

  • checking pattern (stipped, plain, something else?)
  • 检查模式(规定,平原,其他?)

  • determining color of shirt
  • 确定衬衫的颜色


Checking whether image shows shirt with long or short sleevs
This one is in my opinion the easiest. You mentioned that you have category name, but basing on google graphics it seems that it may not be obvious whether Shirt or TShirt has long or short sleevs.
My solution is quite simple:

检查图像是否显示长袖或短袖衬衫这个在我看来是最简单的。你提到你有类别名称,但基于谷歌图形似乎不是很明显衬衫或TShirt是长袖还是短袖。我的解决方案非常简单:

  • Find face on image
  • 在图像上找到面孔

  • Use grabcut algorithm to extract face mask from image
  • 使用抓取算法从图像中提取面罩

  • Mask face (so after this step only face is left - everythin else is black). Note that this step is not necessary - i've mentioned it only, because it's shown on final image.
  • 面具面(所以在这一步之后只剩下面部 - 其他的都是黑色)。请注意,此步骤不是必需的 - 我只提到它,因为它显示在最终图像上。

  • Convert image to HSV color space
  • 将图像转换为HSV颜色空间

  • Using face mask calculate histogram for H and S color channels of FACE ONLY (without rest of the image)
  • 使用面罩仅计算FACE的H和S颜色通道的直方图(无图像的其余部分)

  • Calculate back projection of hsv image using histogram from previous step. Thanks for that you will get only regions which color (in HSV) is similar to color of face - so you will get only regions which contains skin.
  • 使用上一步骤中的直方图计算hsv图像的反投影。谢谢你,你将只获得颜色(HSV)与脸部颜色相似的区域 - 所以你只能得到包含皮肤的区域。

  • Threshold the result (there is always some noise :) )
  • 阈值结果(总有一些噪音:))

The final result of this algorithm is black and white image which shows skin regions. Using this image you can calculate number of pixels with skin and check whether skin is only on face or maybe somewhere else. You can try to find contours as well - generally both solututions will give chance to check if hands are visible. Yes - shirt has short sleevs, no - long sleevs.

该算法的最终结果是显示皮肤区域的黑白图像。使用此图像,您可以计算皮肤像素数,并检查皮肤是仅在脸上还是在其他地方。您也可以尝试找到轮廓 - 通常两个解决方案都有机会检查手是否可见。是的 - 衬衫有短袖,没有长袖。

Here are the results (from top-left corner - original image, face mask (result of grabcut algorithm), masked face, hsv image, result of calculating back projection, result of using threshold on previous one): 使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像使用图像而不是通过标签从数据库中查找相似的图像
As you can see, unfortunetely it fails for image 3, because face is in very similar color to shirt pattern (and generally face color is quite close to white - something is wrong with this guy, he should spend more time outside ;) ).

以下是结果(从左上角 - 原始图像,面罩(抓取算法的结果),蒙面,hsv图像,计算背投影的结果,使用前一个阈值的结果):如你所见,不幸的是它不适用于图像3,​​因为脸部的颜色与衬衫图案的颜色非常相似(通常脸颜色非常接近白色 - 这个家伙出了点问题,他应该花更多时间在外面;))。

Source is quite simple, but if you don't understand something feel free to ask:

来源非常简单,但如果你不明白,可以随意问:

import cv2
import numpy as np


def process_image(img, face_pos, title):
    if len(face_pos) == 0:
        print 'No face found!'
        return
    mask = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8) #create mask with the same size as image, but only one channel. Mask is initialized with zeros
    cv2.grabCut(img, mask, tuple(face_pos[0]), np.zeros((1,65), dtype=np.float64), np.zeros((1,65), dtype=np.float64), 1, cv2.GC_INIT_WITH_RECT) #use grabcut algorithm to find mask of face. See grabcut description for more details (it's quite complicated algorithm)
    mask = np.where((mask==1) + (mask==3), 255, 0).astype('uint8') #set all pixels == 1 or == 3 to 255, other pixels set to 0
    img_masked = cv2.bitwise_and(img, img, mask=mask) #create masked image - just to show the result of grabcut
    #show images
    cv2.imshow(title, mask) 
    cv2.imshow(title+' masked', img_masked)

    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) #convert image to hsv
    channels = [0,1]
    channels_ranges = [180, 256]
    channels_values = [0, 180, 0, 256]
    histogram = cv2.calcHist([img_hsv], channels, mask, channels_ranges, channels_values) #calculate histogram of H and S channels
    histogram = cv2.normalize(histogram, None, 0, 255, cv2.NORM_MINMAX) #normalize histogram

    dst = cv2.calcBackProject([img_hsv], channels, histogram, channels_values, 1) # calculate back project (find all pixels with color similar to color of face)
    cv2.imshow(title + ' calcBackProject raw result', dst)

    ret, thresholded = cv2.threshold(dst, 25, 255, cv2.THRESH_BINARY) #threshold result of previous step (remove noise etc)
    cv2.imshow(title + ' thresholded', thresholded)

    cv2.waitKey(5000)
    #put partial results into one final image
    row1 = np.hstack((img, cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR), img_masked))
    row2 = np.hstack((img_hsv, cv2.cvtColor(dst, cv2.COLOR_GRAY2BGR), cv2.cvtColor(thresholded, cv2.COLOR_GRAY2BGR)))
    return np.vstack((row1, row2))


paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg']
haar_cascade = cv2.CascadeClassifier('C:\\DevTools\\src\\opencv\\data\\haarcascades\\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder

for path in paths:
    img = cv2.imread(path)
    face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
        face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    result = process_image(img, face_pos, path)
    cv2.imwrite('result_' + path, result) #save the result

Checking pattern (stipped, plain, something else?) and determining color of shirt
Here i would try to extract (mask) the shirt from the image and than operate only on it. To achieve it i would try to use similar approach as in previous part - grabcut algorithm. Initializing it might be harder this time. Quite easy (but probably not perfect) solution which comes to my mind is:

检查图案(规定,平原,别的?)和确定衬衫的颜色在这里,我会尝试从图像中提取(掩盖)衬衫,而不是只在它上面操作。为了实现它,我将尝试使用与前一部分类似的方法 - 抓取算法。这次初始化可能会更难。我想到的非常简单(但可能不完美)的解决方案是:

  • set rect around almost whole area (leave just few pixels on each side)
  • 在几乎整个区域周围设置矩形(每边只留几个像素)

  • initialize mask to sure foreground value in the middle of the image - just draw some circle in the middle using sure foregroung "color"
  • 将遮罩初始化为图像中间的前景值 - 只需在中间绘制一些圆圈,使用“foregroung”颜色

  • set mask to sure background in the corners of the image
  • 将遮罩设置为图像角落的确定背景

  • set mask to sure background in face rectangle (the one founded using Haar cascade in step "Checking whether image shows shirt with long or short sleevs")
  • 将面具设置为面部矩形中的确定背景(在步骤中使用Haar级联创建的那个“检查图像是否显示具有长袖或短袖的衬衫”)

Alternatively you can initialize whole mask as sure foreground or possible foreground and use watershed algorithm to find the big white area (which is backgrund). Once you have this area - use it as background.
Most likely using those 2 solutions together will give you the best results.

或者,您可以将整个蒙版初始化为可靠的前景或可能的前景,并使用分水岭算法来查找大的白色区域(即背景)。一旦你有这个区域 - 用它作为背景。最有可能同时使用这两种解决方案将为您带来最佳效果。

You can try much easier solution as well. It looks like all images has got SHIRT not background, skin or anything else sligtly over the center of it. Just like here: 使用图像而不是通过标签从数据库中查找相似的图像 so you can just analyze only this part of shirt. You can try to localize this sure shirt part of image using Haar cascade as well - just find face and than move the founded rectangle down.

您也可以尝试更简单的解决方案。看起来所有的图像都是衬衫,而不是背景,皮肤或其他任何东西都在它的中心。就像这里:所以你只能分析这部分衬衫。您可以尝试使用Haar级联来定位这个可靠的衬衫部分图像 - 只需找到面部,然后向下移动已建立的矩形。

Once you have masked shirt you can calculate its parameters. 2 things which i would try are:

一旦你有蒙面衬衫,你可以计算它的参数。我会尝试的两件事是:

  • convert it to HSV color space and calculate histograms (2 separates - not one as we did in previous step) for Hue and for Saturations channels. Comparing those histograms for 2 shirts should give you chance to find shirt with similar colors. For comparing histogram i would use some (normalized) correlation coefficients.
  • 将其转换为HSV颜色空间并计算Hue和饱和度通道的直方图(2个分离 - 不是我们在上一步中所做的那个)。比较2件衬衫的直方图应该有机会找到颜色相近的衬衫。为了比较直方图,我将使用一些(标准化的)相关系数。

  • use Fourier transform to see what frequencies are the most common in this shirt. For plain shirts it should be much smaller frequencies than for stripped.
  • 使用傅立叶变换来查看这件衬衫中最常见的频率。对于普通衬衫,它应该比剥离的频率小得多。

I know that those solutons aren't perfect, but hope it helps. If you will have any problems or questions - feel free to ask.

我知道那些解决方案并不完美,但希望它有所帮助。如果您有任何问题或疑问 - 请随时提出。


//edit:
I've done some simple pattern comparision using Fourier transform. Results are... not very good, not very bad - better than nothing, but definitely not perfect ;) I would say it's good point to start.
Package with code and images (yours + some from google) is here. Code:

//编辑:我使用傅立叶变换完成了一些简单的模式比较。结果是......不是很好,也不是很糟糕 - 比什么都没有好,但绝对不是完美的;)我想说这是开始的好点。带有代码和图片的包(你的+谷歌的一些)就在这里。码:

import cv2
import numpy as np
from collections import OrderedDict
import operator


def shirt_fft(img, face_pos, title):
    shirt_rect_pos = face_pos[0]
    # print shirt_rect_pos
    shirt_rect_pos[1] += 2*shirt_rect_pos[3] #move down (by 2 * its height) rectangle with face - now it will point shirt sample
    shirt_sample = img[shirt_rect_pos[1]:shirt_rect_pos[1]+shirt_rect_pos[3], shirt_rect_pos[0]:shirt_rect_pos[0]+shirt_rect_pos[2]].copy() #crop shirt sample from image
    shirt_sample = cv2.resize(shirt_sample, dsize=(256, 256)) #resize sample to (256,256)
    # cv2.imshow(title+' shirt sample', shirt_sample)

    shirt_sample_gray = cv2.cvtColor(shirt_sample, cv2.COLOR_BGR2GRAY) #convert to gray colorspace

    f = np.fft.fft2(shirt_sample_gray) #calculate fft
    fshift = np.fft.fftshift(f) #shift - now the brightest poitn will be in the middle
    # fshift = fshift.astype(np.float32)
    magnitude_spectrum = 20*np.log(np.abs(fshift)) # calculate magnitude spectrum (it's easier to show)
    print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
    magnitude_spectrum = cv2.normalize(magnitude_spectrum, alpha=255.0, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1) #normalize the result and convert to 8uc1 (1 channe with 8 bits - unsigned char) datatype
    print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
    # cv2.imshow(title+' fft magnitude', magnitude_spectrum)
    magnitude_spectrum_original = magnitude_spectrum.copy()
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, magnitude_spectrum.max()*0.75, 255.0, cv2.THRESH_TOZERO)
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 125, 255.0, cv2.THRESH_TOZERO)
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 250, 255.0, cv2.THRESH_TOZERO_INV) #clear the brightest part
    temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 200, 255.0, cv2.THRESH_TOZERO) #clear all values from 0 to 200 - removes noise etc
    # cv2.imshow(title+' fft magnitude thresholded', magnitude_spectrum)
    # cv2.waitKey(1)

    # if chr(cv2.waitKey(5000)) == 'q':
        # quit()

    # return fshift
    return shirt_sample_gray, magnitude_spectrum_original, magnitude_spectrum

paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg']
haar_cascade = cv2.CascadeClassifier('C:\\DevTools\\src\\opencv\\data\\haarcascades\\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder

fft_dict = OrderedDict()
results_img = None

for path in paths:
    img = cv2.imread(path)
    face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
        face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    # result = process_image(img, face_pos, path)
    # cv2.imwrite('result_' + path, result) #save the result
    results = shirt_fft(img, face_pos, path)
    if results_img is None:
        results_img = np.hstack(results)
    else:
        results_img = np.vstack((results_img, np.hstack(results)))
    fft_dict[path] = results[2]

similarity_dict = {}
cv2.imshow('results_img', results_img)
cv2.waitKey(1)


#for each image calcualte value of correlation with each other image
for i in range(len(fft_dict.keys())):
    for j in range(i+1, len(fft_dict.keys())):
    # for j in range(i, len(fft_dict.keys())):
        key1, key2 = fft_dict.keys()[i], fft_dict.keys()[j]
        print 'pair: ', key1, key2 
        img1 = fft_dict[key1]
        img2 = fft_dict[key2].copy()
        # img2 = img2[10:246, 10:246]
        correlation = cv2.matchTemplate(img1, img2, cv2.TM_CCORR_NORMED)
        # correlation = cv2.matchTemplate(img1, img2, cv2.TM_SQDIFF_NORMED)
        # print correlation
        print correlation.shape, correlation.dtype, correlation.max()
        similarity_dict[key1 + ' - ' + key2] = correlation.max()
        # similarity_dict[key1 + ' - ' + key2] = correlation

#sort values (from best to worst matches)
sorted_similarity_dict = sorted(similarity_dict.items(), key=operator.itemgetter(1), reverse=True)
print "final result: "
for a in sorted_similarity_dict:
    print a


cv2.waitKey(50000)

Some lines are commented - you can try to use them, maybe you will achieve better results.
Basic algorithm is quite simple - for each image:

有些行被注释 - 你可以尝试使用它们,也许你会获得更好的结果。基本算法非常简单 - 对于每个图像:

  • cut shirt sample from image (just move down rectangle with face by 2* it height)
  • 从图像中剪下衬衫样品(只需将面朝下的矩形向下移动2 *高度)

  • convert this rect to gray colorspace and resize to (256, 256)
  • 将此矩形转换为灰色空间并调整为(256,256)

  • calculate fft of this sample
  • 计算此样本的fft

  • calculate magnite spectrum of fft transform
  • 计算fft变换的magnite谱

  • normalize it (from 0 to 255)
  • 将其标准化(从0到255)

  • threshold it (clear all values <200) - this will remove noise etc.
  • 阈值(清除所有值<200) - 这将消除噪音等

Now we can calculate normalized cross corelation of this image between all shirt samples. High result -> similar samples. Final results:

现在我们可以计算所有衬衫样本之间该图像的标准化交叉核心。结果很高 - >类似的样品。最终结果:

('plain1.jpg - plain3.jpg', 1.0)  
('plain3.jpg - plain4.jpg', 1.0)  
('plain1.jpg - plain4.jpg', 1.0)  
('stripes1.jpg - stripes2.jpg', 0.54650664)  
('1.jpg - 3.jpg', 0.52512592)  
('plain1.jpg - stripes1.jpg', 0.45395589)  
('plain3.jpg - stripes1.jpg', 0.45395589)  
('plain4.jpg - stripes1.jpg', 0.45395589)  
('plain1.jpg - plain2.jpg', 0.39764369)  
('plain2.jpg - plain4.jpg', 0.39764369)  
('plain2.jpg - plain3.jpg', 0.39764369)  
('2.jpg - stripes1.jpg', 0.36927304)  
('2.jpg - plain3.jpg', 0.35678366)  
('2.jpg - plain4.jpg', 0.35678366)  
('2.jpg - plain1.jpg', 0.35678366)  
('1.jpg - plain1.jpg', 0.28958824)  
('1.jpg - plain3.jpg', 0.28958824)  
('1.jpg - plain4.jpg', 0.28958824)  
('2.jpg - 3.jpg', 0.27775836)  
('4.jpg - plain3.jpg', 0.2560707)  
('4.jpg - plain1.jpg', 0.2560707)  
('4.jpg - plain4.jpg', 0.2560707)  
('3.jpg - stripes1.jpg', 0.25498456)  
('4.jpg - plain2.jpg', 0.24522379)  
('1.jpg - 2.jpg', 0.2445447)  
('plain4.jpg - stripes2.jpg', 0.24032137)  
('plain3.jpg - stripes2.jpg', 0.24032137)  
('plain1.jpg - stripes2.jpg', 0.24032137)  
('3.jpg - stripes2.jpg', 0.23217434)  
('plain2.jpg - stripes2.jpg', 0.22518013)  
('2.jpg - stripes2.jpg', 0.19549081)  
('plain2.jpg - stripes1.jpg', 0.1805127)  
('3.jpg - plain4.jpg', 0.14908621)  
('3.jpg - plain1.jpg', 0.14908621)  
('3.jpg - plain3.jpg', 0.14908621)  
('4.jpg - stripes2.jpg', 0.14738286)  
('2.jpg - plain2.jpg', 0.14187276)  
('3.jpg - 4.jpg', 0.13638313)  
('1.jpg - stripes1.jpg', 0.13146029)  
('4.jpg - stripes1.jpg', 0.11624481)  
('1.jpg - plain2.jpg', 0.11515292)  
('2.jpg - 4.jpg', 0.091361843)  
('1.jpg - 4.jpg', 0.074155055)  
('1.jpg - stripes2.jpg', 0.069594234)  
('3.jpg - plain2.jpg', 0.059283193)  

Image with all the shirt samples, magnitude spectrums (before and after threshold) is here: 使用图像而不是通过标签从数据库中查找相似的图像

所有衬衫样本,幅度谱(阈值前后)的图像在这里:

The images names are (in the same order as samples on this big image): ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg'] As you can see, thresholded images are quite similar for samples with same pattern. I think that this solution could work better if you just find a better way to compare those images (thresholded magnitude spectrums).

图像名称(与此大图像上的样本的顺序相同):['1.jpg','2.jpg','3.jpg','4.jpg','plain1.jpg','plain2 .jpg','plain3.jpg','plain4.jpg','stripes1.jpg','stripes2.jpg']正如您所看到的,对于具有相同模式的样本,阈值图像非常相似。我认为如果你找到一种更好的方法来比较这些图像(阈值幅度谱),这个解决方案可以更好地工作。

edit2:
Just a simple idea - after you crop shirt samples from lot of shirts, you can try to train some classifier and than recognize patterns them using this classifier. Look for tutorials about training Haar or LBP(local binary pattern) cascades.

edit2:只是一个简单的想法 - 在你从很多衬衫裁剪衬衫样品后,你可以尝试训练一些分类器,然后使用这个分类器识别它们。寻找有关训练Haar或LBP(局部二进制模式)级联的教程。