深度学习实践经验:用Faster R-CNN训练行人检测数据集Caltech——准备工作

Faster R-CNN是Ross Girshick大神在Fast R-CNN基础上提出的又一个更加快速、更高mAP的用于目标检测的深度学习框架,它对Fast R-CNN进行的最主要的优化就是在Region Proposal阶段,引入了Region Proposal Network (RPN)来进行Region Proposal,同时可以达到和检测网络共享整个图片的卷积网络特征的目标,使得region proposal几乎是cost free的。

Faster R-CNN的代码是开源的,有两个版本:MATLAB版本(faster_rcnn)Python版本(py-faster-rcnn)

这里我主要使用的是Python版本,Python版本在测试期间会比MATLAB版本慢10%,因为Python layers中的一些操作是在CPU中执行的,但是准确率应该是差不多的。



  1. 克隆Faster R-CNN仓库:

    git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git


  2. 编译Cython模块:

    cd py-faster-rcnn/lib
  3. 编译里面的Caffe和pycaffe:

    cd py-faster-rcnn/caffe-fast-rcnn

    # 按照编译Caffe的方法,进行编译

    # 注意Makefile.config的修改,这里不再赘述Caffe的安装

    # 编译

    make -j8 && make pycaffe
  4. 这里贴上我的Makefile.config文件代码,根据你的情况进行相应修改

    ## Refer to http://caffe.berkeleyvision.org/installation.html

    # Contributions simplifying and improving our build system are welcome!

    # cuDNN acceleration switch (uncomment to build with cuDNN).

    USE_CUDNN := 1

    # CPU-only switch (uncomment to build without GPU support).

    # CPU_ONLY := 1

    # uncomment to disable IO dependencies and corresponding data layers

    # USE_OPENCV := 0

    # USE_LEVELDB := 0

    # USE_LMDB := 0

    # uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)

    # You should not set this flag if you will be reading LMDBs with any

    # possibility of simultaneous read and write


    # Uncomment if you're using OpenCV 3


    # To customize your choice of compiler, uncomment and set the following.

    # N.B. the default for Linux is g++ and the default for OSX is clang++

    # CUSTOM_CXX := g++

    # CUDA directory contains bin/ and lib/ directories that we need.

    CUDA_DIR := /usr/local/cuda

    # On Ubuntu 14.04, if cuda tools are installed via

    # "sudo apt-get install nvidia-cuda-toolkit" then use this instead:

    # CUDA_DIR := /usr

    # CUDA architecture setting: going with all of them.

    # For CUDA < 6.0, comment the *_50 lines for compatibility.

    CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
    -gencode arch=compute_20,code=sm_21 \
    -gencode arch=compute_30,code=sm_30 \
    -gencode arch=compute_35,code=sm_35 \
    -gencode arch=compute_50,code=sm_50 \
    -gencode arch=compute_50,code=compute_50

    # BLAS choice:

    # atlas for ATLAS (default)

    # mkl for MKL

    # open for OpenBlas

    BLAS :=mkl

    # Custom (MKL/ATLAS/OpenBLAS) include and lib directories.

    # Leave commented to accept the defaults for your choice of BLAS

    # (which should work)!

    # BLAS_INCLUDE := /path/to/your/blas

    # BLAS_LIB := /path/to/your/blas

    # Homebrew puts openblas in a directory that is not on the standard search path

    # BLAS_INCLUDE := $(shell brew --prefix openblas)/include

    # BLAS_LIB := $(shell brew --prefix openblas)/lib

    # This is required only if you will compile the matlab interface.

    # MATLAB directory should contain the mex binary in /bin.

    MATLAB_DIR := /usr/local/MATLAB/R2016b

    # MATLAB_DIR := /Applications/MATLAB_R2012b.app

    # NOTE: this is required only if you will compile the python interface.

    # We need to be able to find Python.h and numpy/arrayobject.h.

    # PYTHON_INCLUDE := /usr/include/python2.7 \


    # Anaconda Python distribution is quite popular. Include path:

    # Verify anaconda location, sometimes it's in root.

    ANACONDA_HOME := $(HOME)/anaconda
    $(ANACONDA_HOME)/include/python2.7 \
    $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \
    $ /usr/include/python2.7

    # Uncomment to use Python 3 (default is Python 2)

    # PYTHON_LIBRARIES := boost_python3 python3.5m

    # PYTHON_INCLUDE := /usr/include/python3.5m \

    # /usr/lib/python3.5/dist-packages/numpy/core/include

    # We need to be able to find libpythonX.X.so or .dylib.

    # PYTHON_LIB := /usr/lib


    # Homebrew installs numpy in a non standard path (keg only)

    # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include

    # PYTHON_LIB += $(shell brew --prefix numpy)/lib

    # Uncomment to support layers written in Python (will link against Python libs)


    # Whatever else you find you need goes here.

    # INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include

    # LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

    INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

    # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies

    # INCLUDE_DIRS += $(shell brew --prefix)/include

    # LIBRARY_DIRS += $(shell brew --prefix)/lib

    # Uncomment to use `pkg-config` to specify OpenCV library paths.

    # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)

    # USE_PKG_CONFIG := 1

    # N.B. both build and distribute dirs are cleared on `make clean`

    BUILD_DIR := build
    DISTRIBUTE_DIR := distribute

    # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171

    # DEBUG := 1

    # The ID of the GPU that 'make runtest' will use to run unit tests.

    TEST_GPUID := 0

    # enable pretty build (comment to see full commands)

    Q ?= @


为了检验你的py-faster-rcnn是否成功安装,作者给出了一个demo,可以利用在PASCAL VOC2007数据集上体现训练好的模型,来进行demo的运行,步骤如下:

  1. 下载预训练好的Faster R-CNN检测器:

    cd py-faster-rcnn


    • ZF_faster_rcnn_final.caffemodel:在ZF网络模型下训练所得
    • VGG16_faster_rcnn_final.caffemodel:在VGG16网络模型下训练所得。
  2. 运行demo:

    cd py-faster-rcnn
  3. demo会检测5张图片,这5张图片放在data/demo/文件夹下,其中一张的检测结果如下:

  4. 至此如果上述过程没有出错,那么py-faster-rcnn算是成功编译安装。


由于Faster R-CNN的一部分实验是在PASCAL VOC2007数据集上进行的,所以要想用Faster R-CNN训练我们自己的数据集,首先应该搞清楚PASCAL VOC2007数据集中的目录、图片、标注格式,这样我们才能用自己的数据集制作出类似于PASCAL VOC2007类似的数据集,供Faster R-CNN来进行训练及测试。

获取PASCAL VOC2007数据集

这一部分不是必须的,如果你需要PASCAL VOC2007数据集,可以利用以下命令获取数据集,但我们下载VOC数据集的目的主要是观察他的文件结构和文件内容,以便于我们构建符合要求的自己的数据集。

  1. 创建一个专门用来存数据集的地方,假设是$HOME/data文件夹。

  2. 下载PASCAL VOC2007的训练、验证和测试数据集:

    cd $HOME/data
    wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
    wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
  3. 下载完后用以下命令解压:

    tar xvf VOCtrainval_06-Nov-2007.tar
    tar xvf VOCtest_06-Nov-2007.tar
  4. 会得到如下文件结构:

    $HOME/data/VOCdevkit/                        # 根文件夹
    $HOME/data/VOCdevkit/VOC2007 # VOC2007文件夹
    $HOME/data/VOCdevkit/VOC2007/Annotations # 标记文件夹
    $HOME/data/VOCdevkit/VOC2007/ImageSets # 供train.txt、test.txt、val.txt等文件存放的文件夹
    $HOME/data/VOCdevkit/VOC2007/JPEGImages # 存放图片文件夹

    # ... 以及其他的文件夹及子文件夹 ...
  5. 创建快捷方式symlinks来连接到VOC数据集存放的地方:

    cd py-faster-rcnn/data
    ln -s $HOME/data/VOCdevkit/ VOCdevkit



  6. 至此VOC数据集创建完毕。


PASCAL VOC数据集的文件结构,如下:

└── VOCdevkit
└── VOC2007 
├── Annotations  
├── ImageSets  
│ ├── Layout  
│ ├── Main  
│ └── Segmentation  
├── JPEGImages  
├── SegmentationClass  
└── SegmentationObject


该文件夹主要用来存放图片标注(即为ground truth),文件是.xml格式,每张图片都有一个.xml文件与之对应。选取其中一个文件进行如下分析:

<folder>VOC2007</folder> # 必须有,父文件夹的名称
<filename>000005.jpg</filename> # 必须有
<source> # 可有可无
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<owner> # 可有可无
<flickrid>archintent louisville</flickrid>
<size> # 表示图像大小
<segmented>0</segmented> # 用于分割
<object> # 目标信息,类别,bbox信息,图片中每个目标对应一个<object>标签

需要注意的,对于我们自己准备的xml标记文件中,每个<object>标签中的<xmin><ymin>标签中所对应的坐标值最好大于0,千万不能为负数,否则在训练过程中会报错:AssertionError: assert (boxes[:, 2]) >= boxes[:, 0]).all(),如下:

经过以上对PASCAL VOC数据集文件结构的分析,我们仿照其,创建首先创建类似的文件结构即可:

└── VOCdevkit
└── VOC2007 
└── Caltech 
├── Annotations  
├── ImageSets   
│ └── Main  
└── JPEGImages


  • 至于Caltech数据集如何从.seq文件转化为一张张.jpg图片,这里可以参考这里
  • 至于Annotations中一个个.xml标记文件是实验室师兄给我的,上面提到的方法也可以转化,但是并不符合要求。
  • 至于ImageSets中的train.txt是根据.xml文件得来的,test.txt是每个seq中每隔30帧取一帧图片得来的。



