比较文件OpenCV中的多个直方图

I have a dataset of images, where I create the histogram of every image and then I want to store (write) them into a file, so that for every new image I use as input, I compare the histogram of this image with the ones I already have in the file and find if they are identical. The code so far is this:

我有一个图像数据集,我在其中创建每个图像的直方图,然后我想将它们存储(写入)到一个文件中,这样对于我用作输入的每个新图像,我将该图像的直方图与我已经在文件中找到它们是否相同。到目前为止的代码是这样的:

import numpy as np
import cv2
import os.path
import glob
import matplotlib.pyplot as plt
import pickle

index = {}

#output dic
out = {
    1: {},
    2: {},
    3: {},
}

for t in [1]:

    #load_files
    files = glob.glob(os.path.join("..", "data", "train", "Type_{}".format(t), "*.jpg"))
    no_files = len(files)

    #iterate and read
    for n, file in enumerate(files):
        try:
            image = cv2.imread(file)
            img = cv2.resize(image, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_AREA)

            # features : histograms
            plt.hist(img.flatten(), 256, [0, 256], color='r')
            plt.xlim([0,256])
            plt.legend('histogram', loc='upper left')
            plt.show()
            # index[file] = hist

            # write histograms into file
            #compare them and find similarity score
            # result_dist = compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)

            print(file, t, "-files left", no_files - n)

        except Exception as e:
            print(e)
            print(file)

Can someone guide me through this? Thanks!

有人可以指导我完成这个吗?谢谢!

1 个解决方案

#1

You could compute the red channel histogram of all the images like this:

您可以计算所有图像的红色通道直方图,如下所示:

import os
import glob
import numpy as np
from skimage import io

root = 'C:\Users\you\imgs'  # Change this appropriately
folders = ['Type_1', 'Type_2', 'Type_3']
extension = '*.bmp'  # Change if necessary

def compute_red_histograms(root, folders, extension):
    X = []
    y = []
    for n, imtype in enumerate(folders):
        filenames = glob.glob(os.path.join(root, imtype, extension))    
        for fn in filenames:
            img = io.imread(fn)
            red = img[:, :, 0]
            h, _ = np.histogram(red, bins=np.arange(257), normed=True)
            X.append(h)
            y.append(n)
    return np.vstack(X), np.array(y)

X, y = compute_red_histograms(root, folders, extension)

Each image is represented through a 256-dimensional feature vector (the components of the red channel histogram), hence X is a 2D NumPy array with as many rows as there are images in your dataset and 256 columns. y is a 1D NumPy array with numeric class labels, i.e. 0 for Type_1, 1 for Type_2and 2 for Type_3.

每个图像通过256维特征向量(红色通道直方图的分量)表示,因此X是2D NumPy数组,其行数与数据集中的图像数为256列。 y是带有数字类标签的1D NumPy数组,即Type_1为0,Type_2为1,Type_3为2。

Next you could split your dataset into train and test like so:

接下来,您可以将数据集拆分为train和test,如下所示:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

And finally, you could train a SVM classifier:

最后,您可以训练SVM分类器:

from sklearn.svm import SVC

clf = SVC()
clf.fit(X_train, y_train)

By doing so you can make predictions or assess classification accuracy very easily:

通过这样做,您可以非常轻松地进行预测或评估分类准确性:

In [197]: y_test
Out[197]: array([0, 2, 0, ..., 0, 0, 1])

In [198]: clf.predict(X_test)
Out[198]: array([2, 2, 2, ..., 2, 2, 2])

In [199]: y_test == clf.predict(X_test)
Out[199]: array([False,  True, False, ..., False, False, False], dtype=bool)

In [200]: clf.score(X_test, y_test)
Out[200]: 0.3125

#1