From Github Issues of PSPNet
for the training, the issues are mainly related to bn layer:
- 1 Whether to update the parameters (mean,variance,slope,bias) of ‘bn’? (是否需要更新 BN 层参数？)
- If you are working on the same (or similar) dataset as the released model, you can just fix the ‘bn’ layer for fine tuning;(如果是与公开模型相同的数据集，可固定 BN 层参数)
- if not, you may need to update the parameters.(如果与公开模型数据集不同，则需要更新 BN 层参数)
- 2 What may need more attention when updating parameters?(更新参数时的注意事项)
- The batch size when doing batch normalization is important and it’s better to set it above 16 in a calculation step for we need to keep the current mean and variance approximate to the global statistics that will be used in the testing step.(BN 层处理时 batch size很关键，最好设为大于16，因为需要保持当前的 mean 和 variance 逼近全局统计值)
- While semantic segmentation is memory consuming and to maintain a larger crop size (related to certain dataset) may cause small batch size on each GPU card.(由于语义分割消耗大量显存以保证较大的裁剪尺寸，因此在单个GPU显卡上可能batch size较小)
So during our training step, we use MPI to gather data from different GPU cards and then do the bn operation.(故在训练的时候，采用 MPI 来收集不同GPU显卡上的数据，再进行 BN 处理)
While it seems that current official Caffe doesn’t support such communication. We are trying to make our training code compatible with BVLC and you can have a glance at Caffe vision of yjxiong which is a OpenMPI-based Multi-GPU version. If you are working on other datasets, maybe other platform can support such bn communication.
- 3 why set mult_lr so large in the succeed conv layers after conv5_3/relu layer? e.g.
and what the differences between your BN layer and the one from BLVC/caffe ?
- we assume that the newly added layers should have a larger lr then layers that are initialized from pretrained models.(假设新增加的层有较大的学习率 lr， 然后其它层从预训练的模型进行初始化)
For the bn problem, the one in ‘BLVC/caffe’ does the normalization first then followed by a ‘scale’ layer to learn the transfer. While the one in this repo merge ‘slope’ and ‘bias’ in ‘scale’ layer into the bn layer.
- 4 I don’t understand about the bin size of the pyramid pooling module (11, 22, 33, 66) in the paper. Does it mean that, for instance of bin size 3*3, the width and height of each feature map after pooling are both 3? If yes, each feature map is square?
- Yes, for the original design is trained with a square input(like 473*473), so in the ppm the pooled ones are all squared maps.
- i. Let’s say your crop size of the input data is c, then it should be a number that can fit equation c = 8x+1; Then your size in conv5_3 denotes as w = x + 1;
- ii. In each pool level L(1,2,3,6), assume the kernel size is k, and stride is s, and k>=s, say k = s+a;
In level 1, w = s+a;
In level 2, w = 2s+a;
In level 3, w = 3s+a;
In level 6, w = 6s+a;
So your s and k in level L should be s=[w/L], k=s+w%L. Also, you can modify the pool layer and interp layer to do automatic calculation.
- [From Fromandto]:
- I am training a 713 resolution pspnet on 2 x 12gb titan x with batch size 1, and it seems almost all memories are used.
- So I guess training with batchsize 16 would require about 32 titan x cards (12gb memory) ?
- I cannot find details about how many gpus are used in the paper, so I want to confirm that how many gpus are required to train with batchsize 16 according to your experience ?
- I really wonder what is the quantitative performance improvement between batchsize 16 and batchsize 1, because in the paper and this thread you emphasize that batchsize matters yet in deeplab-v2 (and according to my own experience) training with batchsize 1 also works (to some extent). Do I really need to use batchsize 16 (and potentially 32 cards ?) to achieve ideal performance ?
- [From huaxinxiao]
- If your batchsize is 1, the batch normalization layer may be not work. However, the bn layer seems important to the performance of PSPNet.
- [From Fromandto]
- this is exactly what i am concerned … but I just don’t have 32 gpus (or is there anything wrong with my setting so that 4 gpus are enough to train 16 batch ?)
- [From huaxinxiao]
- Smaller crop size (<321) will work in 4 gpus. Besides, you should use OpenMPI-based Multi-GPU caffe to gather the bn parameters.
- I am using the training script of deeplab-v2. it is compatible.
- [From kardoszc]
- these cudnn_layers based on cudnn4.0, if you want to use cudnn5.0, (especially when you train your model with new gpu like ttx or 1080 which cudnn4.0 didn’t work well), you need to replace these cudnn_layers with latest caffe layers.
cp new_caffe/include/caffe/util/cudnn.hpp ./include/caffe/util/cudnn.hpp
cp new_caffe/include/caffe/layers/cudnn_* ./include/caffe/layers/
cp new_caffe/src/caffe/layers/cudnn_* ./src/caffe/layers/