一、说明
SIFT Flow 是一个标注的语义分割的数据集,有两个label,一个是语义分类(33类),另一个是场景标签(3类)。
Semantic and geometric segmentation classes for scenes. Semantic: 0 is void and 1–33 are classes. 01 awning
02 balcony
03 bird
04 boat
05 bridge
06 building
07 bus
08 car
09 cow
10 crosswalk
11 desert
12 door
13 fence
14 field
15 grass
16 moon
17 mountain
18 person
19 plant
20 pole
21 river
22 road
23 rock
24 sand
25 sea
26 sidewalk
27 sign
28 sky
29 staircase
30 streetlight
31 sun
32 tree
33 window Geometric: -1 is void and 1–3 are classes. 01 sky
02 horizontal
03 vertical
二、模型训练
1、源码下载
git clone git@github.com:shelhamer/fcn.berkeleyvision.org.git
2、数据准备
下载标注好的SiftFlowDataset.zip数据集,地址:http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/SiftFlowDataset.zip
将压缩包解压至data/sift-flow文件夹下。
3、代码修改
git clone git@github.com:litingpan/fcn.git
或从https://github.com/litingpan/fcn 下载,替换掉siftflow-fcn32s整个文件夹。
其中solve.py修改如下:
import caffe
import surgery, score import numpy as np
import os
import sys try:
import setproctitle
setproctitle.setproctitle(os.path.basename(os.getcwd()))
except:
pass # weights = '../ilsvrc-nets/vgg16-fcn.caffemodel'
vgg_weights = '../ilsvrc-nets/VGG_ILSVRC_16_layers.caffemodel'
vgg_proto = '../ilsvrc-nets/VGG_ILSVRC_16_layers_deploy.prototxt' # init
# caffe.set_device(int(sys.argv[1]))
caffe.set_device(0)
caffe.set_mode_gpu() # solver = caffe.SGDSolver('solver.prototxt')
# solver.net.copy_from(weights)
solver = caffe.SGDSolver('solver.prototxt')
vgg_net = caffe.Net(vgg_proto, vgg_weights, caffe.TRAIN)
surgery.transplant(solver.net, vgg_net)
del vgg_net # surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers) # scoring
test = np.loadtxt('../data/sift-flow/test.txt', dtype=str) for _ in range(50):
solver.step(2000)
# N.B. metrics on the semantic labels are off b.c. of missing classes;
# score manually from the histogram instead for proper evaluation
score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
4、下载预训练模型
Revisions · ILSVRC-2014 model (VGG team) with 16 weight layers https://gist.github.com/ksimonyan/211839e770f7b538e2d8/revisions
同时下载VGG_ILSVRC_16_layers.caffemodel和VGG_ILSVRC_16_layers_deploy.prototxt放在ilsvrc-nets目录下
5、训练
python solve.py
训练完成后,在snapshot目录下train_iter_100000.caffemodel即为训练好的模型。
三、预测
1、模型准备
可以使用我们前面训练好的模型,如果不想自己训练,则可以直接下载训练好的模型http://dl.caffe.berkeleyvision.org/siftflow-fcn32s-heavy.caffemodel
2、deploy.prototxt
由test.prototxt修改过来的,主要修改了有三个地方,
(1)输入层
layer {
name: "input"
type: "Input"
top: "data"
input_param {
# These dimensions are purely for sake of example;
# see infer.py for how to reshape the net to the given input size.
shape { dim: 1 dim: 3 dim: 256 dim: 256 }
}
}
注意Input中,要与被测图片的尺寸一致。
(2)删掉了drop层
(3)删除了含有loss层相关层
3、infer.py
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import sys
import caffe # the demo image is "2007_000129" from PASCAL VOC # load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe
im = Image.open('coast_bea14.jpg')
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= np.array((104.00698793,116.66876762,122.67891434))
in_ = in_.transpose((2,0,1)) # load net
net = caffe.Net('deploy.prototxt', 'snapshot/train_iter_100000.caffemodel', caffe.TEST)
# shape for input (data blob is N x C x H x W), set data
net.blobs['data'].reshape(1, *in_.shape)
net.blobs['data'].data[...] = in_
# run net and take argmax for prediction
net.forward()
sem_out = net.blobs['score_sem'].data[0].argmax(axis=0) # plt.imshow(out,cmap='gray');
plt.imshow(sem_out)
plt.axis('off')
plt.savefig('coast_bea14_sem_out.png')
sem_out_img = Image.fromarray(sem_out.astype('uint8')).convert('RGB')
sem_out_img.save('coast_bea14_sem_img_out.png') geo_out = net.blobs['score_geo'].data[0].argmax(axis=0)
plt.imshow(geo_out)
plt.axis('off')
plt.savefig('coast_bea14_geo_out.png')
geo_out_img = Image.fromarray(geo_out.astype('uint8')).convert('RGB')
geo_out_img.save('coast_bea14_geo_img_out.png')
其中,sem_out_img保存着语义分割的结果,geo_out_img保存场景标识的结果。
4、测试
python infer.py
Sift-flow中的图片都为256*256*3的彩色图片
images保存的是数据,semanticlabels保存的是语义分割标签,一共33类(而标注的数据会多一个无效类)。geolabels保存场景识别标签,共3类(而标注的数据会多一个无效类)。
所以是分别训练了两个网络,网络的前七层一样。
其中coast_bea14_sem_out.png为语义分割的结果, coast_bea14_geo_out.png为场景标识的结果,
原图 语义分割 场景标识
end