专栏首页AIUAI论文实践讨论 - Pyramid Scene Parsing Network

论文实践讨论 - Pyramid Scene Parsing Network

From Github Issues of PSPNet


author’s answer

for the training, the issues are mainly related to bn layer:

  • 1 Whether to update the parameters (mean,variance,slope,bias) of ‘bn’? (是否需要更新 BN 层参数?)
    • If you are working on the same (or similar) dataset as the released model, you can just fix the ‘bn’ layer for fine tuning;(如果是与公开模型相同的数据集,可固定 BN 层参数)
    • if not, you may need to update the parameters.(如果与公开模型数据集不同,则需要更新 BN 层参数)
  • 2 What may need more attention when updating parameters?(更新参数时的注意事项)
    • The batch size when doing batch normalization is important and it’s better to set it above 16 in a calculation step for we need to keep the current mean and variance approximate to the global statistics that will be used in the testing step.(BN 层处理时 batch size很关键,最好设为大于16,因为需要保持当前的 mean 和 variance 逼近全局统计值)
    • While semantic segmentation is memory consuming and to maintain a larger crop size (related to certain dataset) may cause small batch size on each GPU card.(由于语义分割消耗大量显存以保证较大的裁剪尺寸,因此在单个GPU显卡上可能batch size较小) So during our training step, we use MPI to gather data from different GPU cards and then do the bn operation.(故在训练的时候,采用 MPI 来收集不同GPU显卡上的数据,再进行 BN 处理) While it seems that current official Caffe doesn’t support such communication. We are trying to make our training code compatible with BVLC and you can have a glance at Caffe vision of yjxiong which is a OpenMPI-based Multi-GPU version. If you are working on other datasets, maybe other platform can support such bn communication.
  • 3 why set mult_lr so large in the succeed conv layers after conv5_3/relu layer? e.g.
layer {
  name: "conv6"
  type: "Convolution"
  bottom: "conv5_4"
  top: "conv6"
  param {
    lr_mult: 10
    decay_mult: 1
  param {
    lr_mult: 20
    decay_mult: 1
  convolution_param {
    num_output: 21
    kernel_size: 1
    stride: 1
    weight_filler {
      type: "msra"

and what the differences between your BN layer and the one from BLVC/caffe ? - we assume that the newly added layers should have a larger lr then layers that are initialized from pretrained models.(假设新增加的层有较大的学习率 lr, 然后其它层从预训练的模型进行初始化) For the bn problem, the one in ‘BLVC/caffe’ does the normalization first then followed by a ‘scale’ layer to learn the transfer. While the one in this repo merge ‘slope’ and ‘bias’ in ‘scale’ layer into the bn layer.

  • 4 I don’t understand about the bin size of the pyramid pooling module (11, 22, 33, 66) in the paper. Does it mean that, for instance of bin size 3*3, the width and height of each feature map after pooling are both 3? If yes, each feature map is square?
    • Yes, for the original design is trained with a square input(like 473*473), so in the ppm the pooled ones are all squared maps.
      • i. Let’s say your crop size of the input data is c, then it should be a number that can fit equation c = 8x+1; Then your size in conv5_3 denotes as w = x + 1;
      • ii. In each pool level L(1,2,3,6), assume the kernel size is k, and stride is s, and k>=s, say k = s+a; In level 1, w = s+a; In level 2, w = 2s+a; In level 3, w = 3s+a; In level 6, w = 6s+a; So your s and k in level L should be s=[w/L], k=s+w%L. Also, you can modify the pool layer and interp layer to do automatic calculation.

other’s discussion

  • [From Fromandto]:
    • I am training a 713 resolution pspnet on 2 x 12gb titan x with batch size 1, and it seems almost all memories are used.
    • So I guess training with batchsize 16 would require about 32 titan x cards (12gb memory) ?
    • I cannot find details about how many gpus are used in the paper, so I want to confirm that how many gpus are required to train with batchsize 16 according to your experience ?
    • I really wonder what is the quantitative performance improvement between batchsize 16 and batchsize 1, because in the paper and this thread you emphasize that batchsize matters yet in deeplab-v2 (and according to my own experience) training with batchsize 1 also works (to some extent). Do I really need to use batchsize 16 (and potentially 32 cards ?) to achieve ideal performance ?
    • [From huaxinxiao]
    • If your batchsize is 1, the batch normalization layer may be not work. However, the bn layer seems important to the performance of PSPNet.
    • [From Fromandto]
    • this is exactly what i am concerned … but I just don’t have 32 gpus (or is there anything wrong with my setting so that 4 gpus are enough to train 16 batch ?)
    • [From huaxinxiao]
    • Smaller crop size (<321) will work in 4 gpus. Besides, you should use OpenMPI-based Multi-GPU caffe to gather the bn parameters.
    • I am using the training script of deeplab-v2. it is compatible.
    • [From kardoszc]
    • these cudnn_layers based on cudnn4.0, if you want to use cudnn5.0, (especially when you train your model with new gpu like ttx or 1080 which cudnn4.0 didn’t work well), you need to replace these cudnn_layers with latest caffe layers.
cp new_caffe/include/caffe/util/cudnn.hpp ./include/caffe/util/cudnn.hpp 
cp new_caffe/include/caffe/layers/cudnn_* ./include/caffe/layers/ 
cp new_caffe/src/caffe/layers/cudnn_* ./src/caffe/layers/



0 条评论
登录 后参与评论


  • Github 项目 - OpenPose 参数说明

    每个 Flag 包括 flag_name, default value 和 description.

  • 论文阅读学习 - 深度学习网络模型分析对比

    [Paper - An Analysis of Deep Neural Network Models for Practiacal Applications]

  • WordPress - Apache2 配置文件和开启重写模式

    关于重写模式, 很多资源都是介绍修改 Apache2 httpd.conf, 但我找了很久都未找到 httpd.conf 文件.

  • ROS机器人项目开发11例-ROS Robotics Projects(6)Matlab和Android


  • 在DataGrid中选择,确认,删除多行复选框列表

    在DataGrid中选择,确认,删除多行复选框列表 Selecting, Confirming & Deleting Mul...

  • Application Architecture Guide 2.0 - CH 19 - Mobile Applications(4)

    本文翻译"Porting"、"Power"、"Synchronization"、"User Interface"和"Performance Considerat...

  • 【论文推荐】最新5篇聊天机器人(Chatbot)相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

    【导读】专知内容组整理了最近五篇聊天机器人(Chatbot)相关文章,为大家进行介绍,欢迎查看! 1. A Deep Reinforcement Learnin...

  • 欧漫风诠释YCG品牌故事

    ? 腾讯ISUX isux.tencent.com 社交用户体验设计 ? (YCG品牌宣传片) There are many talented artist...

  • 金句频频:用信息瓶颈的迁移学习和探索;关键状态

    We present a hierarchical reinforcement learning (HRL) or options framework for ...

  • 语义分割--Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

    Mix-and-Match Tuning for Self-Supervised Semantic Segmentation AAAI Conference...