classfication中使用图像金字塔和sliding windows提高准确率

之前对imagenet的预训模型进行finetune，找出了很多样本选择时的注意事项，当时在测试如下这张照片时，效果不好，我认为是物体过小造成的，因此尝试使用图像金字塔的方法：

当时结果如下：

classfication中使用图像金字塔和sliding windows提高准确率

一开始我准备使用 net_full_conv.blobs['data'].reshape(1,3,scale_img.shape[1],scale_img.shape[0]) 这条语句，但一直出现“check failed: K_ == newK_ (9216 vs .20736) Input size incompatible with inner product parameters”，在网上搜了半天大概是因为：K_是输神经元的数目，它的值是channels*H*W，而在inner_product_layer.cpp中有if check( K_ != new_K) ,cout<<"Input size incompatible with inner product parameters";K_是初始的值，我训练模型时长和高都是227,传播到全连接层时为H和W，但我如果使用 net_full_conv.blobs['data'].reshape(1,3,scale_img.shape[1],scale_img.shape[0]) 这条语句则会使初始的inner product和新的inner product不一致从而出错；（人脸框识别时使用这条语句时没有错是因为其网络结构中没有全连接层，卷积层后直接接softmax层，卷积层输出为2，这样一来没有inner product这一参数就不会出现这种错误，在github上找到类似地问题，别人说他将全连接层中的inner product都去掉就可以了，但我未实践成功，链接：https://github.com/jacobandreas/nmn2/issues/17,大家有类似问题的可以看一下）

我的做法是先对原图进行图像金字塔，再使用sliding windows，stride为50，窗口大小为227，这样就不会出现check的错误了

def jinzita(imgfile):

    term_1 = []

    term_2 = []

    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) #设定图像的shape格式

    transformer.set_mean('data', np.load(caffe_root +

                                             '/python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) #减去均值操作

    transformer.set_transpose('data', (2,0,1))  #move image channels to outermost dimension

    transformer.set_channel_swap('data', (2,1,0)) #swap channels from RGB to BGR

    transformer.set_raw_scale('data', 255.0) #rescale from [0,1] to [0,]

    randNum = random.randint(1 , 10000)

    scales = []  #设置几个scale，组成图像金字塔

    factor = 0.793700526  #图像放大或者缩小的一个因子（经验值）

    image = cv2.imread(imgfile) #读入测试图像

    largest = min(2, 4000/max(image.shape[0:2])) #设定做scale变幻时最大的scale

    scale = largest

    minD = largest*min(image.shape[0:2]) #设定最小的scale

    while minD >= 227: #只要最小的边做完最大的scale变换后大于227，之前得到的largest就可以作为最大的scale来用，并依此乘上factor，加入到scale列表中

        scales.append(scale)

        scale *= factor

        minD *= factor

    #scales = [2 ,3 ,4]

    for scale in scales:

        print scale

        x1 = 0

        y1 = 0

        scale_img = cv2.resize(image , (int(image.shape[1] * scale) , int(image.shape[0] *scale)))  #调整图像的长和高,shape[0]是高,注意这里还存在一个转置，所以是先shape[1],再是shape[0]

        cv2.imwrite('/media/zhaofan/Myfile/caffe/data/finetune_1/scale/scale.jpg' , scale_img)

        while ((y1 + 227) <= scale_img.shape[0]):

            while ((x1 + 227) <= scale_img.shape[1]):

                img = scale_img[y1 : y1 + 227 , x1 : x1 + 227]

                cv2.imwrite('/media/zhaofan/Myfile/caffe/data/finetune_1/scale/crop_img.jpg' , img)

                im = caffe.io.load_image('/media/zhaofan/Myfile/caffe/data/finetune_1/scale/crop_img.jpg') #得到的特征值是0到1之间的小数

                output = net.forward(data=np.asarray([transformer.preprocess('data', im)]))

                output_prob = output['prob'][0] #output_prob存储属于每类的概率，['prob'][0]，它是一个一维数组

                top_inds = output_prob.argsort()[: : -1][: 5]

                term_1.append(output_prob[top_inds[0]])

                term_2.append(labels[top_inds[0]])

                x1 += 50

            x1 = 0

            y1 += 50

    inds = term_1.index(max(term_1))

    print term_1[inds] , term_2[inds]

最后结果为：

0.99585 2 bird

相比于之前已改善很很多！

（这里还有一点要注意的地方，在对图像进行操作比如裁剪，保存等操作时，要使用相同的库，比如全部用PIL库或者全部用cv库或其他库，否则会出错）

源码链接: https://pan.baidu.com/s/1jHThUcA 密码: pkra

秒客网

classfication中使用图像金字塔和sliding windows提高准确率

相关文章