Caffe中的Siamese网络

王云峰

发布于 2019-12-25 16:27:51

7520

发布于 2019-12-25 16:27:51

Siamese原意是”泰国的，泰国人”，而与之相关的一个比较常见的词是”Siamese twin”，意思是是”连体双胞胎”，所以Siamemse Network是从这个意思转变而来，指的是结构非常相似的两路网络，分别训练，但共享各个层的参数，在最后有一个连接的部分。Siamese网络对于相似性比较的场景比较有效。此外Siamese因为共享参数，所以能减少训练过程中的参数个数。这里的slides讲解了Siamese网络在深度学习中的应用。这里我参照Caffe中的Siamese文档，以LeNet为例，简单地总结下Caffe中Siamese网络的prototxt文件的写法。

1. Data层

Data层输入可以是LMDB和LevelDB格式的数据，这种格式的数据可以通过参照$CAFFE_ROOT/examples/siamese/create_mnist_siamese.sh来生成（该脚本是从MNIST原先的格式生成DB文件，如果要从JPEG格式的图片来生成DB文件，需要进行一定的修改）。 Data层有2个输出，一个是pair_data，表示配对好的图片对;另一个是sim，表示图片对是否属于同一个label。

2. Slice层

Slice层是Caffe中的一个工具层，功能就是把输入的层(bottom)切分成几个输出层(top)。官网给出的如下例子:

layer {
  name: "slicer_label"
  type: "Slice"
  bottom: "label"
  ## Example of label with a shape N x 3 x 1 x 1
  top: "label1"
  top: "label2"
  top: "label3"
  slice_param {
    axis: 1
    slice_point: 1
    slice_point: 2
  }
}

完成的功能就是把slicer_label划分成3份。axis表示划分的维度，这里1表示在第二个维度上划分;slice_point表示划分的中间的点，分别是1，2表示在1-2层和2-3层之间进行一个划分。在Siamese网络中，为了对数据对进行单独的训练，需要在Data层后面接一个Slice层，将数据均匀地划分为2个部分。

3. 共享层

后面的卷积层，Pooling层，Relu层对于两路网络是没有区别的，所以可以直接写好一路后，复制一份在后面作为另一路，不过得将name，bottom和top的名字改成不一样的(示例中第二路的名字都是在第一路对应层的名字后面加了个_p表示pair)。

4. 如何共享参数

两路网络如何共享参数呢？Caffe里是这样实现的:在每路中对应的层里面都定义一个同名的参数，这样更新参数的时候就可以共享参数了。如下面的例子:

...

layer {                                                                         
  name: "ip2"                                                                   
  type: "InnerProduct"                                                          
  bottom: "ip1"                                                                 
  top: "ip2"                                                                    
  param {                                                                       
    name: "ip2_w"                                                               
    lr_mult: 1                                                                  
  }                
}

...

layer {                                                                         
  name: "ip2_p"                                                                 
  type: "InnerProduct"                                                          
  bottom: "ip1_p"                                                               
  top: "ip2_p"                                                                  
  param {                                                                       
    name: "ip2_w"                                                               
    lr_mult: 1                                                                  
  }              
}

...

上面例子中，两路网络对应层都定义了ip2_w的参数，这样训练的时候就可以共享这个变量的值了。

5. feature层

在2个全连接层后，我们将原来的分类的sofatmax层改为输出一个2维向量的全连接层:

layer {                                                                         
  name: "feat"                                                                  
  type: "InnerProduct"                                                          
  bottom: "ip2"                                                                 
  top: "feat"                                                                   
  param {                                                                       
    name: "feat_w"                                                              
    lr_mult: 1                                                                  
  }                                                                             
  param {                                                                       
    name: "feat_b"                                                              
    lr_mult: 2                                                                  
  }                                                                             
  inner_product_param {                                                         
    num_output: 2                                                               
    weight_filler {                                                             
      type: "xavier"                                                            
    }                                                                           
    bias_filler {                                                               
      type: "constant"                                                          
    }                                                                           
  }                                                                             
}

6. ContrastiveLoss层

在两个feature产生后，就可以利用2个feature和前面定义的sim来计算loss了。Siamese网络采用了一个叫做“ContrastiveLoss”的loss计算方式，如果两个图片越相似，则loss越小;如果越不相似，则loss越大。

layer {                                                                         
  name: "loss"                                                                  
  type: "ContrastiveLoss"                                                       
  bottom: "feat"                                                                
  bottom: "feat_p"                                                              
  bottom: "sim"                                                                 
  top: "loss"                                                                   
  contrastive_loss_param {                                                      
    margin: 1                                                                   
  }                                                                             
}

7. 网络结构的可视化

上面就是所有的网络结构，利用$CAFFE_ROOT/python/draw_net.py这个脚本可以画出网络结构，如图所示:

整个网络的完整内容如下:

name: "mnist_siamese_train_test"
layer {
  name: "pair_data"
  type: "Data"
  top: "pair_data"
  top: "sim"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/siamese/mnist_siamese_train_leveldb"
    batch_size: 64
  }
}
layer {
  name: "pair_data"
  type: "Data"
  top: "pair_data"
  top: "sim"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/siamese/mnist_siamese_test_leveldb"
    batch_size: 100
  }
}
layer {
  name: "slice_pair"
  type: "Slice"
  bottom: "pair_data"
  top: "data"
  top: "data_p"
  slice_param {
    slice_dim: 1
    slice_point: 1
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    name: "conv1_w"
    lr_mult: 1
  }
  param {
    name: "conv1_b"
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    name: "conv2_w"
    lr_mult: 1
  }
  param {
    name: "conv2_b"
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    name: "ip1_w"
    lr_mult: 1
  }
  param {
    name: "ip1_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    name: "ip2_w"
    lr_mult: 1
  }
  param {
    name: "ip2_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "feat"
  type: "InnerProduct"
  bottom: "ip2"
  top: "feat"
  param {
    name: "feat_w"
    lr_mult: 1
  }
  param {
    name: "feat_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "conv1_p"
  type: "Convolution"
  bottom: "data_p"
  top: "conv1_p"
  param {
    name: "conv1_w"
    lr_mult: 1
  }
  param {
    name: "conv1_b"
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1_p"
  type: "Pooling"
  bottom: "conv1_p"
  top: "pool1_p"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2_p"
  type: "Convolution"
  bottom: "pool1_p"
  top: "conv2_p"
  param {
    name: "conv2_w"
    lr_mult: 1
  }
  param {
    name: "conv2_b"
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2_p"
  type: "Pooling"
  bottom: "conv2_p"
  top: "pool2_p"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1_p"
  type: "InnerProduct"
  bottom: "pool2_p"
  top: "ip1_p"
  param {
    name: "ip1_w"
    lr_mult: 1
  }
  param {
    name: "ip1_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1_p"
  type: "ReLU"
  bottom: "ip1_p"
  top: "ip1_p"
}
layer {
  name: "ip2_p"
  type: "InnerProduct"
  bottom: "ip1_p"
  top: "ip2_p"
  param {
    name: "ip2_w"
    lr_mult: 1
  }
  param {
    name: "ip2_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "feat_p"
  type: "InnerProduct"
  bottom: "ip2_p"
  top: "feat_p"
  param {
    name: "feat_w"
    lr_mult: 1
  }
  param {
    name: "feat_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "ContrastiveLoss"
  bottom: "feat"
  bottom: "feat_p"
  bottom: "sim"
  top: "loss"
  contrastive_loss_param {
    margin: 1
  }
}

8. 训练过程

训练过程与别的网络是一样的，这里就不具体展开了。

9. 参考内容

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2016-12-13，如有侵权请联系 cloudcommunity@tencent.com 删除

机器学习

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

机器学习

登录后参与评论

0 条评论

热度