http://imbinwang.github.io/blog/inside-caffe-code-layer/
Bin Wang
About Archive

Jun 30, 2015
8 minute read

Layer（层）是Caffe中最庞大最繁杂的模块，它是网络的基本计算单元。由于Caffe强调模块化设计，因此只允许每个layer完成一类特定的计算，例如convolution操作、pooling、非线性变换、内积运算，以及数据加载、归一化和损失计算等。

模块说明

每个layer的输入数据来自一些’bottom’ blobs, 输出一些’top’ blobs。Caffe中每种类型layer的参数说明定义在caffe.proto文件中，具体的layer参数值则定义在具体应用的protocals buffer网络结构说明文件中。例如，卷积层（ConvolutionLayer）的参数说明在caffe.proto中是如下定义的，

// in caffe.proto
// Message that stores parameters used by ConvolutionLayer
message ConvolutionParameter {
  optional uint32 num_output = 1; // The number of outputs for the layer
  optional bool bias_term = 2 [default = true]; // whether to have bias terms
// Pad, kernel size, and stride are all given as a single value for equal
// dimensions in height and width or as Y, X pairs.
  optional uint32 pad = 3 [default = 0]; // The padding size (equal in Y, X)
  optional uint32 pad_h = 9 [default = 0]; // The padding height
  optional uint32 pad_w = 10 [default = 0]; // The padding width
  optional uint32 kernel_size = 4; // The kernel size (square)
  optional uint32 kernel_h = 11; // The kernel height
  optional uint32 kernel_w = 12; // The kernel width
  optional uint32 group = 5 [default = 1]; // The group size for group conv
  optional uint32 stride = 6 [default = 1]; // The stride (equal in Y, X)
  optional uint32 stride_h = 13; // The stride height
  optional uint32 stride_w = 14; // The stride width
  optional FillerParameter weight_filler = 7; // The filler for the weight
  optional FillerParameter bias_filler = 8; // The filler for the bias
enum Engine {
 DEFAULT = 0;
 CAFFE = 1;
 CUDNN = 2;
  }
  optional Engine engine = 15 [default = DEFAULT];
}

其中的参数说明包括卷积核的个数、大小和步长等。在examples\mnist\lenet_train_test.prototxt网络结构说明文件中，具体一个卷积层（ConvolutionLayer）是这样定义的，

//in examples\mnist\lenet_train_test.prototxt
layer {
  name: "conv1" // 层的名字
type: "Convolution" // 层的类型，说明具体执行哪一种计算
  bottom: "data" // 层的输入数据Blob的名字
  top: "conv1" // 层的输出数据Blob的名字
  param { // 层的权值和偏置相关参数
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param { // 卷积层卷积运算相关的参数
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
type: "xavier"
    }
    bias_filler {
type: "constant"
    }
  }
}

层的输入输出结构:
bottom blob -> conv layer -> top blob

每种类型的layer需要定义三种关键操作LayerSetUp, Forward, Backward：

LayerSetUp: 网络构建时初始化层和层的连接
Forward: 网络数据前向传递，给定bottom输入数据，计算输出到top
Backward： 网络误差反向传递，给定top的梯度，计算bottom的梯度并存储到bottom blob

实现细节

Caffe中与Layer相关的头文件有7个，

layer.hpp: 父类Layer，定义所有layer的基本接口。
data_layers.hpp: 继承自父类Layer，定义与输入数据操作相关的子Layer，例如DataLayer，HDF5DataLayer和ImageDataLayer等。
vision_layers.hpp: 继承自父类Layer，定义与特征表达相关的子Layer，例如ConvolutionLayer，PoolingLayer和LRNLayer等。
neuron_layers.hpp: 继承自父类Layer，定义与非线性变换相关的子Layer，例如ReLULayer，TanHLayer和SigmoidLayer等。
loss_layers.hpp: 继承自父类Layer，定义与输出误差计算相关的子Layer，例如EuclideanLossLayer，SoftmaxWithLossLayer和HingeLossLayer等。
common_layers.hpp: 继承自父类Layer，定义与中间结果数据变形、逐元素操作相关的子Layer，例如ConcatLayer，InnerProductLayer和SoftmaxLayer等。
layer_factory.hpp: Layer工厂模式类，负责维护现有可用layer和相应layer构造方法的映射表。

每个Layer根据自身需求的不同，会定义CPU或GPU版本的实现，例如ConvolutionLayer的CPU和GPU实现就定义在了两个文件中conv_layer.cpp, conv_layer.cu。

父类Layer

layer.hpp中定义了Layer的基本接口，成员变量,

protected:
/** The protobuf that stores the layer parameters */
// 层说明参数，从protocal buffers格式的网络结构说明文件中读取
  LayerParameter layer_param_;
/** The phase: TRAIN or TEST */
// 层状态，参与网络的训练还是测试
  Phase phase_;
/** The vector that stores the learnable parameters as a set of blobs. */
// 层权值和偏置参数，使用向量是因为权值参数和偏置是分开保存在两个blob中的
  vector<shared_ptr<Blob<Dtype> > > blobs_;
/** Vector indicating whether to compute the diff of each param blob. */
// 标志每个top blob是否需要计算反向传递的梯度值
  vector<bool> param_propagate_down_;

/** The vector that indicates whether each top blob has a non-zero weight in
 * the objective function. */
// 非LossLayer为零，LossLayer中表示每个top blob计算的loss的权重
  vector<Dtype> loss_;

构造和析构函数,

/**
 * You should not implement your own constructor. Any set up code should go
 * to SetUp(), where the dimensions of the bottom blobs are provided to the
 * layer.
 */
// 显示的构造函数不需要重写，任何初始工作在SetUp()中完成
// 构造方法只复制层参数说明的值，如果层说明参数中提供了权值和偏置参数，也复制
  explicit Layer(const LayerParameter& param)
    : layer_param_(param) {
// Set phase and copy blobs (if there are any).
      phase_ = param.phase();
if (layer_param_.blobs_size() > 0) {
        blobs_.resize(layer_param_.blobs_size());
for (int i = 0; i < layer_param_.blobs_size(); ++i) {
          blobs_[i].reset(new Blob<Dtype>());
          blobs_[i]->FromProto(layer_param_.blobs(i));
        }
      }
    }
// 虚析构
  virtual ~Layer() {}

初始化函数SetUp，每个Layer对象都必须遵循固定的调用模式,

  /**
 * @brief Implements common layer setup functionality.
 * @brief 实现每个layer对象的setup函数
 * @param bottom the preshaped input blobs
 * @param bottom 层的输入数据，blob中的存储空间已申请
 * @param top
 * the allocated but unshaped output blobs, to be shaped by Reshape
 * @param top 层的输出数据，blob对象以构造但是其中的存储空间未申请，
 * 具体空间大小需根据bottom blob大小和layer_param_共同决定，具体在Reshape函数现实
 *
 * Checks that the number of bottom and top blobs is correct.
 * Calls LayerSetUp to do special layer setup for individual layer types,
 * followed by Reshape to set up sizes of top blobs and internal buffers.
 * Sets up the loss weight multiplier blobs for any non-zero loss weights.
 * This method may not be overridden.
 * 1. 检查输入输出blob个数是否满足要求，每个层能处理的输入输出数据不一样
 * 2. 调用LayerSetUp函数初始化特殊的层，每个Layer子类需重写这个函数完成定制的初始化
 * 3. 调用Reshape函数为top blob分配合适大小的存储空间
 * 4. 为每个top blob设置损失权重乘子，非LossLayer为的top blob其值为零
 *
 * 此方法非虚函数，不用重写，模式固定
 */
void SetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
    CheckBlobCounts(bottom, top);
    LayerSetUp(bottom, top);
    Reshape(bottom, top);
    SetLossWeights(top);
  }

每个子类Layer必须重写的初始化函数LayerSetUp，

  /**
 * @brief Does layer-specific setup: your layer should implement this function
 * as well as Reshape.
 * @brief 定制初始化，每个子类layer必须实现此虚函数
 *
 * @param bottom
 * the preshaped input blobs, whose data fields store the input data for
 * this layer
 * @param bottom
 * 输入blob, 数据成员data_和diff_存储了相关数据
 * @param top
 * the allocated but unshaped output blobs
 * @param top
 * 输出blob, blob对象已构造但数据成员的空间尚未申请
 *
 * This method should do one-time layer specific setup. This includes reading
 * and processing relevent parameters from the <code>layer_param_</code>.
 * Setting up the shapes of top blobs and internal buffers should be done in
 * <code>Reshape</code>, which will be called before the forward pass to
 * adjust the top blob sizes.
 * 此方法执行一次定制化的层初始化，包括从layer_param_读入并处理相关的层权值和偏置参数，
 * 调用Reshape函数申请top blob的存储空间
 */
virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {}

每个子类Layer必须重写的Reshape函数，完成top blob形状的设置并为其分配存储空间，

  /**
 * @brief Adjust the shapes of top blobs and internal buffers to accomodate
 * the shapes of the bottom blobs.
 * @brief 根据bottom blob的形状和layer_param_计算top blob的形状并为其分配存储空间
 *
 * @param bottom the input blobs, with the requested input shapes
 * @param top the top blobs, which should be reshaped as needed
 *
 * This method should reshape top blobs as needed according to the shapes
 * of the bottom (input) blobs, as well as reshaping any internal buffers
 * and making any other necessary adjustments so that the layer can
 * accomodate the bottom blobs.
 */
virtual void Reshape(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) = 0;

前向传播函数Forward和反向传播函数Backward，

inline Dtype Forward(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top);
inline void Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom);

这两个函数非虚函数，它们内部会调用如下虚函数完成数据前向传递和误差反向传播，根据执行环境的不同每个子类Layer必须重写CPU和GPU版本，

virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) = 0;
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
// LOG(WARNING) << "Using CPU code as backup.";
return Forward_cpu(bottom, top);
  }

virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) = 0;
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
// LOG(WARNING) << "Using CPU code as backup.";
    Backward_cpu(top, propagate_down, bottom);
  }

Layer的序列化函数,将layer的层说明参数layer_param_，层权值和偏置参数blobs_复制到LayerParameter对象，便于写到磁盘，

// Serialize LayerParameter to protocol buffer
template <typename Dtype>
void Layer<Dtype>::ToProto(LayerParameter* param, bool write_diff) {
  param->Clear();
  param->CopyFrom(layer_param_); // 复制层说明参数layer_param_
  param->clear_blobs();
// 复制层权值和偏置参数blobs_
for (int i = 0; i < blobs_.size(); ++i) {
    blobs_[i]->ToProto(param->add_blobs(), write_diff);
  }
}

子类Data Layers

数据经过date layers进入Caffe的数据处理流程，他们位于网络Net最底层。数据可以来自高效的数据库（LevelDB或LMDB），直接来自内存，或者对效率不太关注时，可以来自HDF5格式的或常见图片格式的磁盘文件。Data Layers继承自Layer，
[Caffe]源码解析之Layer
最终的子类层包括DataLayer，ImageDataLayer，WindowDataLayer，MemoryDataLayer，HDF5DataLayer，HDF5OutputLayer，DummyDataLayer。这里只分析DataLayer，其它数据层类似。

首先，来看DataLayer的LayerSetUp实现过程，DataLayer直接从父类BasePrefetchingDataLayer继承此方法，

// in base_data_layer.cpp
template <typename Dtype>
void BasePrefetchingDataLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// 1. 调用父父类BaseDataLayer构造方法，
  BaseDataLayer<Dtype>::LayerSetUp(bottom, top);
// Now, start the prefetch thread. Before calling prefetch, we make two
// cpu_data calls so that the prefetch thread does not accidentally make
// simultaneous cudaMalloc calls when the main thread is running. In some
// GPUs this seems to cause failures if we do not so.
// 2. 访问预取数据空间，这里是为了提前分配预取数据的存储空间
this->prefetch_data_.mutable_cpu_data();
if (this->output_labels_) {
this->prefetch_label_.mutable_cpu_data();
  }

// 3. 创建用于预取数据的线程
  DLOG(INFO) << "Initializing prefetch";
this->CreatePrefetchThread();
  DLOG(INFO) << "Prefetch initialized.";
}

执行流程大致为：

1.调用父父类BaseDataLayer构造方法，

    // in base_data_layer.cpp
template <typename Dtype>
void BaseDataLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
if (top.size() == 1) {
        output_labels_ = false;
      } else {
        output_labels_ = true;
      }
// The subclasses should setup the size of bottom and top
      DataLayerSetUp(bottom, top);
      data_transformer_.reset(
new DataTransformer<Dtype>(transform_param_, this->phase_));
      data_transformer_->InitRand();
    }

根据top blob的个数判断是否输出数据的label，对output_labels_赋值，接下来调用自己的DataLayerSetUp方法，

    // in data_layer.cpp
template <typename Dtype>
void DataLayer<Dtype>::DataLayerSetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
// Initialize DB
// 打开源数据库
      db_.reset(db::GetDB(this->layer_param_.data_param().backend()));
      db_->Open(this->layer_param_.data_param().source(), db::READ);
      cursor_.reset(db_->NewCursor());

// Check if we should randomly skip a few data points
if (this->layer_param_.data_param().rand_skip()) {
unsigned int skip = caffe_rng_rand() %
this->layer_param_.data_param().rand_skip();
        LOG(INFO) << "Skipping first " << skip << " data points.";
while (skip-- > 0) {
          cursor_->Next();
        }
      }
// Read a data point, and use it to initialize the top blob.
// 读取一个数据对象, 用于分析数据对象的存储空间大小，并未输出到top blob
      Datum datum;
      datum.ParseFromString(cursor_->value());

bool force_color = this->layer_param_.data_param().force_encoded_color();
if ((force_color && DecodeDatum(&datum, true)) ||
          DecodeDatumNative(&datum)) {
        LOG(INFO) << "Decoding Datum";
      }
// image
// 对数据对象进行预处理
int crop_size = this->layer_param_.transform_param().crop_size();
if (crop_size > 0) {
// 为top blob分配存储空间，同时为预取数据分配存储空间
        top[0]->Reshape(this->layer_param_.data_param().batch_size(),
            datum.channels(), crop_size, crop_size);
this->prefetch_data_.Reshape(this->layer_param_.data_param().batch_size(),
            datum.channels(), crop_size, crop_size);
this->transformed_data_.Reshape(1, datum.channels(), crop_size, crop_size);
      } else {
        top[0]->Reshape(
this->layer_param_.data_param().batch_size(), datum.channels(),
            datum.height(), datum.width());
this->prefetch_data_.Reshape(this->layer_param_.data_param().batch_size(),
            datum.channels(), datum.height(), datum.width());
this->transformed_data_.Reshape(1, datum.channels(),
          datum.height(), datum.width());
      }
      LOG(INFO) << "output data size: " << top[0]->num() << ","
          << top[0]->channels() << "," << top[0]->height() << ","
          << top[0]->width();
// label
if (this->output_labels_) {
vector<int> label_shape(1, this->layer_param_.data_param().batch_size());
        top[1]->Reshape(label_shape);
this->prefetch_label_.Reshape(label_shape);
      }
    }

打开数据源数据库，读取一个数据对象，对数据对象进行预处理，为top blob分配存储空间，同时为预取数据分配存储空间。

2.访问预取数据空间，为了提前分配预取数据的存储空间。

3.调用CreatePrefetchThread方法，创建用于预取数据的线程。

层初始化的工作完成。接下来看DataLayer的Forward实现过程，因为DataLayer位于网络最底层，因此无需实现Backward。DataLayer直接从父类BasePrefetchingDataLayer继承Forward方法，且只实现了CPU版本Forward_cpu，

// in base_data_layer.cpp
template <typename Dtype>
void BasePrefetchingDataLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// First, join the thread
// 等待线程的数据预取结束
  JoinPrefetchThread();
  DLOG(INFO) << "Thread joined";
// Reshape to loaded data.
  top[0]->Reshape(this->prefetch_data_.num(), this->prefetch_data_.channels(),
this->prefetch_data_.height(), this->prefetch_data_.width());
// Copy the data
// 将预取的数据复制到top blobs
  caffe_copy(prefetch_data_.count(), prefetch_data_.cpu_data(),
             top[0]->mutable_cpu_data());
  DLOG(INFO) << "Prefetch copied";
if (this->output_labels_) {
    caffe_copy(prefetch_label_.count(), prefetch_label_.cpu_data(),
               top[1]->mutable_cpu_data());
  }
// Start a new prefetch thread
// 创建新线程完成数据预取
  DLOG(INFO) << "CreatePrefetchThread";
  CreatePrefetchThread();
}

可以看到，DataLayer的Forward_cpu就是通过另一个线程预先取得数据源中的数据，需要时将预取的数据复制到top blobs，完成数据的前向传播。

P.S. 注意到在data_layer.cpp文件的最后，有下面两句宏函数，

INSTANTIATE_CLASS(DataLayer);
REGISTER_LAYER_CLASS(Data);

它们被用来做什么了？看看它们的定义，

// ------ in common.hpp ------
// Instantiate a class with float and double specifications.
#define INSTANTIATE_CLASS(classname) \
char gInstantiationGuard##classname; \
template class classname<float>; \
template class classname<double>
// ------ in common.hpp ------

// ------ in layer_factory.hpp ------
#define REGISTER_LAYER_CREATOR(type, creator) \
static LayerRegisterer<float> g_creator_f_##type(#type, creator<float>); \
static LayerRegisterer<double> g_creator_d_##type(#type, creator<double>) \

#define REGISTER_LAYER_CLASS(type) \
template <typename Dtype>                                                    \
shared_ptr<Layer<Dtype> > Creator_##type##Layer(const LayerParameter& param) \
  {                                                                            \
return shared_ptr<Layer<Dtype> >(new type##Layer<Dtype>(param)); \
  }                                                                            \
  REGISTER_LAYER_CREATOR(type, Creator_##type##Layer)
// ------ in layer_factory.hpp ------

其中，INSTANTIATE_CLASS(DataLayer)被用来实例化DataLayer的类模板，REGISTER_LAYER_CLASS(Data)被用来向layer_factory注册DataLayer的构造方法，方便直接通过层的名称（Data）直接获取层的对象。Caffe中内置的层在实现的码的最后都会加上这两个宏。

子类Vision Layers

Vision Layers, 暂时将其翻译成特征表达层，它通常接收“图像”作为输入，输出结果也是“图像”。这里的“图像”可以是真实世界的单通道灰度图像，或RGB彩色图像, 或多通道2D矩阵。在Caffe的上下文环境下，“图像”的显著性特征是它的空间结构：宽w>1，高h>1，这个2D的性质导致Vision Layers具有局部区域操作的性质，例如卷积，池化等。Vision Layers继承自也Layer，继承关系如图所示，

[Caffe]源码解析之Layer

最终的子类层包括ConvolutionLayer，CuDNNConvolutionLayer，PoolingLayer，CuDNNPoolingLayer，LRNLayer，DeconvolutionLayer，还有若干辅助的功能子类层Im2colLayer，SplitLayer。这里会详细分析ConvolutionLayer。（待续）

秒客网