caffe源码分析-layer

bear_fish

发布于 2019-02-25 11:50:39

1K0

发布于 2019-02-25 11:50:39

文章被收录于专栏：用户2442861的专栏用户2442861的专栏

本文主要分析caffe layer层，主要内容如下：

从整体上说明下caffe的layer层的类别，以及作用
通过proto定义与类Layer简要说明下Layer的核心成员变量;
Layer类的核心成员函数

1. 类Layer overview

caffe中的Layer主要分为如下几个模块：

输入层Data Layers

Data Layers定义了caffe中网络的输入，依赖于高效的数据库，例如(LevelDB or LMDB)。并且可以对数据做预处理，例如mean subtraction, scaling, random cropping, mirroring。

常用的有：Input, ImageData.

Vision Layers层(卷积相关)例如，卷积层Convolution Layer, 池化层Pooling Layer等
循环网络层Recurrent Layers 例如，LSTM, RNN等。
Common Layers例如，Inner Product 全连接层，Dropout弃权层，等。
Normalization Layers（归一化层）例如Local Response Normalization (LRN), Batch Normalization 。
Activation / Neuron Layers（激活层）,例如ReLU, Sigmoid等。
Utility Layers, 例如Flatten, Reshape等。
Loss Layers, 例如Sigmoid Cross-Entropy Loss, Sum-of-Squares / Euclidean等。

layer.hpp: 父类Layer，定义所有layer的基本接口。

data_layers.hpp: 继承自父类Layer，定义与输入数据操作相关的子Layer，例如DataLayer，HDF5DataLayer和ImageDataLayer等。

vision_layers.hpp: 继承自父类Layer，定义与特征表达相关的子Layer，例如ConvolutionLayer，PoolingLayer和LRNLayer等。

neuron_layers.hpp: 继承自父类Layer，定义与非线性变换相关的子Layer，例如ReLULayer，TanHLayer和SigmoidLayer等。

loss_layers.hpp: 继承自父类Layer，定义与输出误差计算相关的子Layer，例如EuclideanLossLayer，SoftmaxWithLossLayer和HingeLossLayer等。

common_layers.hpp: 继承自父类Layer，定义与中间结果数据变形、逐元素操作相关的子Layer，例如ConcatLayer，InnerProductLayer和SoftmaxLayer等。

layer_factory.hpp: Layer工厂模式类，负责维护现有可用layer和相应layer构造方法的映射表。

每个Layer根据自身需求的不同，会定义CPU或GPU版本的实现，例如ConvolutionLayer的CPU和GPU实现就定义在了两个文件中conv_layer.cpp, conv_layer.cu.

2. 通过proto定义与类Layer简要说明下Layer的核心成员变量

proto的LayerParameter核心参数如下，除了基础的参数外还有其他的继承类如：ConvolutionParameter额外参数。

// LayerParameter next available layer-specific ID: 145 (last added: crop_param)
message LayerParameter {
  optional string name = 1; // the layer name
  optional string type = 2; // the layer type
  repeated string bottom = 3; // the name of each bottom blob
  repeated string top = 4; // the name of each top blob

  // The train / test phase for computation.
  optional Phase phase = 10;

  repeated float loss_weight = 5;
  // The blobs containing the numeric parameters of the layer.
// Specifies training parameters (multipliers on global learning constants,
// and the name and other settings used for weight sharing).
  repeated ParamSpec param = 6;

  repeated BlobProto blobs = 7;

  // The size must be either 0 or equal to the number of bottoms.
  repeated bool propagate_down = 11;

  // Parameters for data pre-processing.
  optional TransformationParameter transform_param = 100;

  // Parameters shared by loss layers.
  optional LossParameter loss_param = 101;

  // Layer type-specific parameters.
  optional ConvolutionParameter convolution_param = 106;
  optional InnerProductParameter inner_product_param = 117;
}

上面的参数，我们重点关注下，ParamSpec，定义如下：

// Specifies training parameters (multipliers on global learning constants,
// and the name and other settings used for weight sharing).
message ParamSpec {
  // The names of the parameter blobs -- useful for sharing parameters among
  // layers, but never required otherwise.  To share a parameter between two
  // layers, give it a (non-empty) name.
  optional string name = 1;
  //......
  // The multiplier on the global learning rate for this parameter.
  optional float lr_mult = 3 [default = 1.0];

  // The multiplier on the global weight decay for this parameter.
  optional float decay_mult = 4 [default = 1.0];
}

此参数定义了反向传播过程的参数更新的学习率(结合solve中的base_lr)，由于参数有weight和bias因此是repeated，使用的示例如下：

layer {
  name: "ip1"
  type: "InnerProduct"
  param { lr_mult: 1 }  // for weight
  param { lr_mult: 2 }  // for bias
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "pool2"
  top: "ip1"
}

在solver.prototxt中我们定义了base_lr: 0.01，因此weight的学习率是0.01， bias的学习率是0.01*2.

3. Layer类的核心成员函数

template<typename Dtype>
class Layer{
protected:
  //protobuf文件中存储的layer参数,从protocal buffers格式的网络结构说明文件中读取
    //protected类成员，构造函数中初始化
    LayerParameter layer_param_;
    //层状态，参与网络的训练还是测试
    Phase phase_;
    // 可学习参数层权值和偏置参数，使用向量是因为权值参数和偏置是分开保存在两个blob中的
    // 在基类layer中初始化(只是在描述文件定义了的情况下)
    vector<shared_ptr<Blob<Dtype> > > blobs_;
    // 标志每个可学习参数blob是否需要计算反向传递的梯度值
    vector<bool> param_propagate_down_;
    // 非LossLayer为零，LossLayer中表示每个top blob计算的loss的权重
    vector<Dtype> loss_;
private:
    /** Whether this layer is actually shared by other nets*/
    bool is_shared_;
    // 若该layer被shared，则需要这个mutex序列保持forward过程的正常运行
    shared_ptr<boost::mutex> forward_mutex_;
}

Layer的核心函数在于Forward，Backward,这两个函数调用Forward_cpu、Forward_gpu以及Backward_cpu，Backward_gpu（这四个函数子类需要实现）。

inline Dtype Forward(const vector<Blob<Dtype>*>& bottom,
                     const vector<Blob<Dtype>*>& top);
//给定相对于 top 层输出的梯度，计算其相对于输入的梯度，并传递到 bottom
 层。一个有参数的 layer 需要计算相对于各个参数的梯度值并存储在内部。
inline void Backward(const vector<Blob<Dtype>*>& top,
                     const vector<bool>& propagate_down,
                     const vector<Blob<Dtype>*>& bottom);

protected:
//纯虚函数，子类必须实现，使用cpu经行前向计算
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
                         const vector<Blob<Dtype>*>& top) = 0;
//使用gpu经行前向计算, 如果gpu没有实现则使用默认的CPU版本
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
                         const vector<Blob<Dtype>*>& top) {
    // LOG(WARNING) << "Using CPU code as backup.";
    return Forward_cpu(bottom, top);
}
//纯虚函数，派生类必须实现
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
                          const vector<bool>& propagate_down,
                          const vector<Blob<Dtype>*>& bottom) = 0;
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
                          const vector<bool>& propagate_down,
                          const vector<Blob<Dtype>*>& bottom) {
    // LOG(WARNING) << "Using CPU code as backup.";
    Backward_cpu(top, propagate_down, bottom);
}

例如Backward仅仅在Backward_cpu，Backward_gpu做了一层包装：

template <typename Dtype>
inline void Layer<Dtype>::Backward(const vector<Blob<Dtype>*>& top,
                                   const vector<bool>& propagate_down,
                                   const vector<Blob<Dtype>*>& bottom) {
    switch (Caffe::mode()) {
        case Caffe::CPU:
            Backward_cpu(top, propagate_down, bottom);
            break;
        case Caffe::GPU:
            Backward_gpu(top, propagate_down, bottom);
            break;
        default:
            LOG(FATAL) << "Unknown caffe mode.";
    }
}

Forward同理：

// 前向传播和反向传播接口。 每个Layer的派生类都应该实现Forward_cpu()
template <typename Dtype>
inline Dtype Layer<Dtype>::Forward(const vector<Blob<Dtype>*>& bottom,
                                   const vector<Blob<Dtype>*>& top) {
    // Lock during forward to ensure sequential forward
    Lock();
    Dtype loss = 0;
    Reshape(bottom, top);// we may change input data size(num)
    switch (Caffe::mode()) {
        case Caffe::CPU:
            Forward_cpu(bottom, top);
            // .......
            break;
        case Caffe::GPU:
            Forward_gpu(bottom, top);
#ifndef CPU_ONLY
            // gpu realize, omitted
#endif
            break;
        default:
            LOG(FATAL) << "Unknown caffe mode.";
    }
    Unlock();
    return loss;
}