tensorflow 语义分割系列DeepLabV3/V4实践

sparkexpert

发布于 2019-05-26 14:00:22

1.7K0

发布于 2019-05-26 14:00:22

语义分割是图像高级别像素理解的主要任务之一，也是无人驾驶的重要技术基础。前面已经对该方面进行过复现实验，见：空洞卷积与DeeplabV2实现图像语义分割的测试（tensorflow)。近段时间，google又推出了deeplab v3及其升级版本(deeplab v3 plus)，并且集成到其model库中，因此，对该库进行集成测试一下。

DeepLab V1---> V4系列的介绍可见model页面的介绍，如下：

DeepLab is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image. Current implementation includes the following features:

DeepLabv1 [1]: We use atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks.
DeepLabv2 [2]: We use atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields-of-views.
DeepLabv3 [3]: We augment the ASPP module with image-level feature [5, 6] to capture longer range information. We also include batch normalization [7] parameters to facilitate the training. In particular, we applying atrous convolution to extract output features at different output strides during training and evaluation, which efficiently enables training BN at output stride = 16 and attains a high performance at output stride = 8 during evaluation.
DeepLabv3+ [4]: We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object boundaries. Furthermore, in this encoder-decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade-off precision and runtime.

1、DeepLab V3

Deeplab V3的论文名称为：Rethinking Atrous Convolution for Semantic Image Segmentation，从这个题目可知，其对空洞卷积模块进行了优化，其主要策略包括两个部分：

　　第一种延伸架构Going Deeper（Cascaded Module）：复制Resnet中最后一个Convolution Block（Block 4），并连续接在后端（图中的Block 5 + 6 + 7），以取得更多multi -scale cascade context。这里为了维持相同的feature map大小（Output Stride=16，表示原本图片是feature map的16倍大），在后面的block上所使用的Atrous Rate需要以指数成长。

　　第二种延伸架构ASPP（Parallel Module）：在最后的feature map上，接上平行的Convolution Block，每一个Block取用不同rate的Atrous Convolution，最后将所有的资讯合并起来再做预测。ASPP在原本的DeepLab就已经被提出了，但是这边作者另外在ASPP后接上了Batch Normalization，另外加入了前面Image Feature Map一起合并做Global Average Pooling ，实验也证明这样的小技巧是有效的。

　　2、Deeplab V3 +

其对应的论文名称为：Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation，为了融合多尺度信息，引入语义分割常用的encoder-decoder。在encoder-decoder架构中，引入可任意控制编码器提取特征的分辨率，通过空洞卷积平衡精度和耗时。在语义分割任务中采用Xception 模型，在ASPP和解码模块使用depthwise separable convolution，提高编码器-解码器网络的运行速率和健壮性。

3、论文实验

　　谷歌已经推出了基于MoblieNetV2和XCeption的DeepLab分割架构，并公开了面向多种数据集的预训练模型。其中MobileNet-V2的模型尚只实现了DeeplabV2效果。Note MobileNet-v2 based models do not employ ASPP and decoder modules for fast computation.

实验架构如下：