手机端运行卷积神经网络实现文档检测功能(二) -- 从 VGG 到 MobileNetV2 知识梳理(续)

从 MobileNet V1 到 MobileNet V2

ResNet、Inception、Xception 追求的目标,就是在达到更高的准确率的前提下,尽量在模型大小、模型运算速度、模型训练速度这几个指标之间找一个平衡点,如果在准确性上允许一定的损失,但是追求更小的模型和更快的速度,这就直接催生了 MobileNet 或类似的以手机端或嵌入式端为运行环境的网络结构的出现。

MobileNet V1

(https://arxiv.org/pdf/1704.04861.pdf)

和 MobileNet V2

(https://arxiv.org/pdf/1801.04381.pdf)

都是基于 Depthwise Separable Convolution 构建的卷积层(类似 Xception,但是并不是和 Xception 使用的 Separable Convolution 完全一致),这是它满足体积小、速度快的一个关键因素,另外就是精心设计和试验调优出来的层结构,下面就对照论文给出两个版本的代码实现。

MobileNet V1

MobileNet V1 的整体结构其实并没有特别复杂的地方,和 VGG 类似,层和层之间就是普通的串联型的结构,有区别的地方主要在于 layer 的内部,如下图所示:

这个图中没有用箭头表示数据的传递方向,但是只要对卷积神经网络有初步的经验,就能看出来数据是从上往下传递的,左图是标准的卷积层操作,类似于前面 HED 网络中 _vgg_conv2d 函数的结构(回想一下前面说过的 Batch Normalization 和 relu 先后顺序的话题,虽然 Batch Normalization 可以放到激活函数的后面,但是很多论文里面都还是习惯性的放在激活函数的前面,所以这里的代码也会严格的遵照论文中的方式),右侧的图相当于 separable convolution,但是在中间是有两次 Batch Normalization 的。

论文中用一张如下的表格来描述了整体结构:

下面是一份简单的代码实现:

def mobilenet_v1(inputs, alpha, is_training):
    if alpha not in [0.25, 0.50, 0.75, 1.0]:
        raise ValueError('alpha must be one of'
                         '`0.25`, `0.50`, `0.75` or `1.0` only.')

    filter_initializer = tf.contrib.layers.xavier_initializer()    
    def _conv2d(inputs, filters, kernel_size, stride, scope=''):
        with tf.variable_scope(scope):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = tf.nn.relu(outputs)        
        return outputs    
    def _mobilenet_v1_conv2d(inputs, 
                          pointwise_conv_filters, 
                          depthwise_conv_kernel_size,
                          stride, # stride is just for depthwise convolution
                          scope=''):
        with tf.variable_scope(scope):
            with tf.variable_scope('depthwise_conv'):                
                '''
                tf.layers Module 里面有一个 tf.layers.separable_conv2d 函数,
                但是它的内部调用流程是 depthwise convolution --> pointwise convolution --> activation func,
                而 MobileNet V1 风格的卷积层的内部调用流程应该是
                depthwise conv --> batch norm --> relu --> pointwise conv --> batch norm --> relu,
                所以需要用其他的手段组装出想要的调用流程,
                一种办法是使用 tf.nn.depthwise_conv2d,但是这个 API 比较底层,代码写起来很笨重。
                后来找到了另外一种可行的办法,借助 tf.contrib.layers.separable_conv2d 函数,
                tf.contrib.layers.separable_conv2d 的第二个参数 num_outputs 如果设置为 None,
                则只会调用内部的 depthwise conv2d 部分,而不执行 pointwise conv2d 部分。
                这样就可以组装出 MobileNet V1 需要的 layer 结构了。

                TensorFlow 提供了四种 API,都命名为 separable_conv2d,但是又存在各种细微的差别,
                有兴趣的读者可以自行阅读相关文档
                tf.contrib.layers.separable_conv2d [Aliases tf.contrib.layers.separable_convolution2d]
                VS
                tf.keras.backend.separable_conv2d
                VS
                tf.layers.separable_conv2d
                VS
                tf.nn.separable_conv2d
                '''
                outputs = tf.contrib.layers.separable_conv2d(
                            inputs,            
                            None, # ref https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.py
                            depthwise_conv_kernel_size,
                            depth_multiplier=1, # 按照论文的描述,这里设置成1
                            stride=(stride, stride),
                            padding='SAME',
                            activation_fn=None,
                            weights_initializer=filter_initializer,
                            biases_initializer=None)

                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)     
            with tf.variable_scope('pointwise_conv'):
                # 论文中 alpha 参数的含义,就是在每一层的 pointwise conv 的位置按比例缩小输出 channels 的数量
                pointwise_conv_filters = int(pointwise_conv_filters * alpha)
                outputs = tf.layers.conv2d(outputs,
                                        pointwise_conv_filters,
                                        (1, 1), 
                                        padding='same', 
                                        activation=None,
                                        use_bias=False, 
                                        kernel_initializer=filter_initializer)

                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)        
                return outputs    
    def _avg_pool2d(inputs, scope=''):
        inputs_shape = inputs.get_shape().as_list()        
        assert len(inputs_shape) == 4

        pool_height = inputs_shape[1]
        pool_width = inputs_shape[2]        
        with tf.variable_scope(scope):
            outputs = tf.layers.average_pooling2d(inputs,
                                      [pool_height, pool_width],
                                      strides=(1, 1),
                                      padding='valid')        
        return outputs    '''
    执行分类任务的网络结构,通常还可以作为实现其他任务的网络结构的 base architecture,
    为了方便代码复用,这里只需要实现出卷积层构成的主体部分,
    外部调用者根据各自的需求使用这里返回的 output 和 end_points。
    比如对于分类任务,按照如下方式使用这个函数

    image_height = 224
    image_width = 224
    image_channels = 3

    x = tf.placeholder(tf.float32, [None, image_height, image_width, image_channels])
    is_training = tf.placeholder(tf.bool, name='is_training')

    output, net = mobilenet_v1(x, 1.0, is_training)
    print('output shape is: %r' % (output.get_shape().as_list()))

    output = tf.layers.flatten(output)
    output = tf.layers.dense(output,
                        units=1024, # 1024 class
                        activation=None,
                        use_bias=True,
                        kernel_initializer=tf.contrib.layers.xavier_initializer())
    print('output shape is: %r' % (output.get_shape().as_list()))
    '''
    with tf.variable_scope('mobilenet', 'mobilenet', [inputs]):
        end_points = {}
        net = inputs 

        net = _conv2d(net, 32, [3, 3], stride=2, scope='block0')
        end_points['block0'] = net
        net = _mobilenet_v1_conv2d(net, 64, [3, 3], stride=1, scope='block1')
        end_points['block1'] = net

        net = _mobilenet_v1_conv2d(net, 128, [3, 3], stride=2, scope='block2')
        end_points['block2'] = net
        net = _mobilenet_v1_conv2d(net, 128, [3, 3], stride=1, scope='block3')
        end_points['block3'] = net

        net = _mobilenet_v1_conv2d(net, 256, [3, 3], stride=2, scope='block4')
        end_points['block4'] = net
        net = _mobilenet_v1_conv2d(net, 256, [3, 3], stride=1, scope='block5')
        end_points['block5'] = net

        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=2, scope='block6')
        end_points['block6'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block7')
        end_points['block7'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block8')
        end_points['block8'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block9')
        end_points['block9'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block10')
        end_points['block10'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block11')
        end_points['block11'] = net

        net = _mobilenet_v1_conv2d(net, 1024, [3, 3], stride=2, scope='block12')
        end_points['block12'] = net
        net = _mobilenet_v1_conv2d(net, 1024, [3, 3], stride=1, scope='block13')
        end_points['block13'] = net

        output = _avg_pool2d(net, scope='output')    
        return output, end_points

MobileNet V2

MobileNet V2 的改动就比较大了,首先引入了两种新的 layer 结构,如下图所示:

很明显的一个差异点,就是左边这种层结构引入了残差网络的手段,另外,这两种层结构中,在 depthwise convolution 之前又添加了一个 1x1 convolution 操作,在之前举得几个例子中,1x1 convolution 都是用来降维的,而在 MobileNet V2 里,这个位于 depthwise convolution 之前的 1x1 convolution 其实用来提升维度的,对应论文中 expansion factor 参数的含义,在 depthwise convolution 之后仍然还有一次 1x1 convolution 调用,但是这个 1x1 convolution 并不会跟随一个激活函数,只是一次线性变换,所以这里也不叫做 pointwise convolution,而是对应论文中的 1x1 projection convolution。

网络的整体结构由下面的表格描述:

代码实现如下:

def mobilenet_v2_func_blocks(is_training):
    filter_initializer = tf.contrib.layers.xavier_initializer()
    activation_func = tf.nn.relu6    
    def conv2d(inputs, filters, kernel_size, stride, scope=''):
        with tf.variable_scope(scope):            
            with tf.variable_scope('conv2d'):
                outputs = tf.layers.conv2d(inputs,
                                        filters, 
                                        kernel_size, 
                                        strides=(stride, stride),
                                        padding='same', 
                                        activation=None,
                                        use_bias=False, 
                                        kernel_initializer=filter_initializer)

                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)            
       return outputs    
    def _1x1_conv2d(inputs, filters, stride):
        kernel_size = [1, 1]        
        with tf.variable_scope('1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)            # no activation_func
        return outputs    
    def expansion_conv2d(inputs, expansion, stride):
        input_shape = inputs.get_shape().as_list()        
        assert len(input_shape) == 4
        filters = input_shape[3] * expansion

        kernel_size = [1, 1]        
        with tf.variable_scope('expansion_1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = activation_func(outputs)        
        return outputs    
    def projection_conv2d(inputs, filters, stride):
        kernel_size = [1, 1]        
        with tf.variable_scope('projection_1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)            # no activation_func
        return outputs    def depthwise_conv2d(inputs, 
                        depthwise_conv_kernel_size,
                        stride):
        with tf.variable_scope('depthwise_conv2d'):
            outputs = tf.contrib.layers.separable_conv2d(
                        inputs,                        
                        None, # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.py
                        depthwise_conv_kernel_size,
                        depth_multiplier=1,
                        stride=(stride, stride),
                        padding='SAME',
                        activation_fn=None,
                        weights_initializer=filter_initializer,
                        biases_initializer=None) 

            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = activation_func(outputs)        
            return outputs    
    def avg_pool2d(inputs, scope=''):
        inputs_shape = inputs.get_shape().as_list()        
        assert len(inputs_shape) == 4

        pool_height = inputs_shape[1]
        pool_width = inputs_shape[2]
        with tf.variable_scope(scope):
            outputs = tf.layers.average_pooling2d(inputs,
                                            [pool_height, pool_width],
                                            strides=(1, 1),
                                            padding='valid')
       return outputs    
    def inverted_residual_block(inputs, 
                            filters, 
                            stride, 
                            expansion=6, 
                            scope=''):
        assert stride == 1 or stride == 2

        depthwise_conv_kernel_size = [3, 3]
        pointwise_conv_filters = filters        
    with tf.variable_scope(scope):
            net = inputs
            net = expansion_conv2d(net, expansion, stride=1)
            net = depthwise_conv2d(net, depthwise_conv_kernel_size, stride=stride)
            net = projection_conv2d(net, pointwise_conv_filters, stride=1)            
            if stride == 1:               
                # 如果 net.get_shape().as_list()[3] != inputs.get_shape().as_list()[3]
                # 借助一个 1x1 的卷积让他们的 channels 相等,然后才能相加
                if net.get_shape().as_list()[3] != inputs.get_shape().as_list()[3]:
                    inputs = _1x1_conv2d(inputs, net.get_shape().as_list()[3], stride=1)

                net = net + inputs                
                return net           
     else:                
         # stride == 2
          return net

    func_blocks = {}
    func_blocks['conv2d'] = conv2d
    func_blocks['inverted_residual_block'] = inverted_residual_block
    func_blocks['avg_pool2d'] = avg_pool2d
    func_blocks['filter_initializer'] = filter_initializer
    func_blocks['activation_func'] = activation_func    
    return func_blocks
def mobilenet_v2(inputs, is_training):
    func_blocks = mobilenet_v2_func_blocks(is_training)
    _conv2d = func_blocks['conv2d'] 
    _inverted_residual_block = func_blocks['inverted_residual_block']
    _avg_pool2d = func_blocks['avg_pool2d']    
    with tf.variable_scope('mobilenet_v2', 'mobilenet_v2', [inputs]):
        end_points = {}
        net = inputs 

        net = _conv2d(net, 32, [3, 3], stride=2, scope='block0_0') # size/2
        end_points['block0'] = net

        net = _inverted_residual_block(net, 16, stride=1, expansion=1, scope='block1_0')
        end_points['block1'] = net

        net = _inverted_residual_block(net, 24, stride=2, scope='block2_0') # size/4
        net = _inverted_residual_block(net, 24, stride=1, scope='block2_1')
        end_points['block2'] = net

        net = _inverted_residual_block(net, 32, stride=2, scope='block3_0') # size/8
        net = _inverted_residual_block(net, 32, stride=1, scope='block3_1') 
        net = _inverted_residual_block(net, 32, stride=1, scope='block3_2')
        end_points['block3'] = net

        net = _inverted_residual_block(net, 64, stride=2, scope='block4_0') # size/16
        net = _inverted_residual_block(net, 64, stride=1, scope='block4_1') 
        net = _inverted_residual_block(net, 64, stride=1, scope='block4_2') 
        net = _inverted_residual_block(net, 64, stride=1, scope='block4_3') 
        end_points['block4'] = net

        net = _inverted_residual_block(net, 96, stride=1, scope='block5_0') 
        net = _inverted_residual_block(net, 96, stride=1, scope='block5_1')
        net = _inverted_residual_block(net, 96, stride=1, scope='block5_2')
        end_points['block5'] = net

        net = _inverted_residual_block(net, 160, stride=2, scope='block6_0') # size/32
        net = _inverted_residual_block(net, 160, stride=1, scope='block6_1') 
        net = _inverted_residual_block(net, 160, stride=1, scope='block6_2') 
        end_points['block6'] = net

        net = _inverted_residual_block(net, 320, stride=1, scope='block7_0')
        end_points['block7'] = net

        net = _conv2d(net, 1280, [1, 1], stride=1, scope='block8_0') 
        end_points['block8'] = net

        output = _avg_pool2d(net, scope='output')    
    return output, end_points

MobileNet V2 Style HED

原始的 HED 使用 VGG 作为基础网络结构来得到 feature maps,参照这种思路,可以把基础网络部分替换为 MobileNet V2,代码如下:

def mobilenet_v2_style_hed(inputs, batch_size, is_training):
    if const.use_kernel_regularizer:
        weights_regularizer = tf.contrib.layers.l2_regularizer(scale=0.0001)    else:
        weights_regularizer = None

    ####################################################
    func_blocks = mobilenet_v2_func_blocks(is_training)    
    # print('============ func_blocks are: %r' % func_blocks)
    _conv2d = func_blocks['conv2d'] 
    _inverted_residual_block = func_blocks['inverted_residual_block']
    _avg_pool2d = func_blocks['avg_pool2d']
    filter_initializer = func_blocks['filter_initializer']
    activation_func = func_blocks['activation_func']    
    ####################################################

    def _dsn_1x1_conv2d(inputs, filters):
        kernel_size = [1, 1]
        outputs = tf.layers.conv2d(inputs,
                                   filters,
                                   kernel_size, 
                                   padding='same', 
                                   activation=None, ## no activation
                                   use_bias=False, 
                                   kernel_initializer=filter_initializer,
                                   kernel_regularizer=weights_regularizer)

        outputs = tf.layers.batch_normalization(outputs, training=is_training)        
        ## no activation
        return outputs    
    def _output_1x1_conv2d(inputs, filters):
        kernel_size = [1, 1]
        outputs = tf.layers.conv2d(inputs,
                                   filters,
                                   kernel_size, 
                                   padding='same', 
                                   activation=None, ## no activation
                                   use_bias=True, ## use bias
                                   kernel_initializer=filter_initializer,
                                   kernel_regularizer=weights_regularizer)        
        ## no batch normalization
        ## no activation

        return outputs    
    def _dsn_deconv2d_with_upsample_factor(inputs, filters, upsample_factor):
        ## https://github.com/s9xie/hed/blob/master/examples/hed/train_val.prototxt
        ## 从这个原版代码里看,是这样计算 kernel_size 的
        kernel_size = [2 * upsample_factor, 2 * upsample_factor]
        outputs = tf.layers.conv2d_transpose(inputs,
                                             filters, 
                                             kernel_size, 
                                             strides=(upsample_factor, upsample_factor), 
                                             padding='same', 
                                             activation=None, ## no activation
                                             use_bias=True, ## use bias
                                             kernel_initializer=filter_initializer,
                                             kernel_regularizer=weights_regularizer)        
        ## 概念上来说,deconv2d 已经是最后的输出 layer 了,只不过最后还有一步 1x1 的 conv2d 把 5 个 deconv2d 的输出再融合到一起
        ## 所以不需要再使用 batch normalization 了

        return outputs    
    with tf.variable_scope('hed', 'hed', [inputs]):
        end_points = {}
        net = inputs        
        ## mobilenet v2 as base net
        with tf.variable_scope('mobilenet_v2'):            
            # 标准的 mobilenet v2 里面并没有这两层,
            # 这里是为了得到和 input image 相同 size 的 feature map 而增加的层
            net = _conv2d(net, 3, [3, 3], stride=1, scope='block0_0')
            net = _conv2d(net, 6, [3, 3], stride=1, scope='block0_1')

            dsn1 = net
            net = _conv2d(net, 12, [3, 3], stride=2, scope='block0_2') # size/2

            net = _inverted_residual_block(net, 6, stride=1, expansion=1, scope='block1_0')

            dsn2 = net
            net = _inverted_residual_block(net, 12, stride=2, scope='block2_0') # size/4
            net = _inverted_residual_block(net, 12, stride=1, scope='block2_1')

            dsn3 = net
            net = _inverted_residual_block(net, 24, stride=2, scope='block3_0') # size/8
            net = _inverted_residual_block(net, 24, stride=1, scope='block3_1') 
            net = _inverted_residual_block(net, 24, stride=1, scope='block3_2')

            dsn4 = net
            net = _inverted_residual_block(net, 48, stride=2, scope='block4_0') # size/16
            net = _inverted_residual_block(net, 48, stride=1, scope='block4_1') 
            net = _inverted_residual_block(net, 48, stride=1, scope='block4_2') 
            net = _inverted_residual_block(net, 48, stride=1, scope='block4_3') 

            net = _inverted_residual_block(net, 64, stride=1, scope='block5_0') 
            net = _inverted_residual_block(net, 64, stride=1, scope='block5_1')
            net = _inverted_residual_block(net, 64, stride=1, scope='block5_2')

            dsn5 = net        
        ## dsn layers
        with tf.variable_scope('dsn1'):
            dsn1 = _dsn_1x1_conv2d(dsn1, 1)    
                
        ## no need deconv2d
        with tf.variable_scope('dsn2'):
            dsn2 = _dsn_1x1_conv2d(dsn2, 1)
            dsn2 = _dsn_deconv2d_with_upsample_factor(dsn2, 1, upsample_factor = 2)        with tf.variable_scope('dsn3'):
            dsn3 = _dsn_1x1_conv2d(dsn3, 1)
            dsn3 = _dsn_deconv2d_with_upsample_factor(dsn3, 1, upsample_factor = 4)        with tf.variable_scope('dsn4'):
            dsn4 = _dsn_1x1_conv2d(dsn4, 1)
            dsn4 = _dsn_deconv2d_with_upsample_factor(dsn4, 1, upsample_factor = 8)        with tf.variable_scope('dsn5'):
            dsn5 = _dsn_1x1_conv2d(dsn5, 1)
            dsn5 = _dsn_deconv2d_with_upsample_factor(dsn5, 1, upsample_factor = 16)        # dsn fuse
        with tf.variable_scope('dsn_fuse'):
            dsn_fuse = tf.concat([dsn1, dsn2, dsn3, dsn4, dsn5], 3)
            dsn_fuse = _output_1x1_conv2d(dsn_fuse, 1)    
    return dsn_fuse, dsn1, dsn2, dsn3, dsn4, dsn5

这个 MobileNet V2 风格的 HED 网络,整体结构和 VGG 风格的 HED 并没有区别,只是把 VGG 里面用到的卷积层操作替换成了 MobileNet V2 对应的卷积层,另外,因为 MobileNet V2 的第一个卷积层就设置了 stride=2,并不匹配 dsn1 层的 size,所以额外添加了两个 stride=1 的普通卷积层,把它们的输出作为 dsn1 层。

MobileNet V2 As Base Net

MobileNet 只是针对手机运行环境设计出来的执行 分类任务 的网络结构,但是,和同样执行分类任务的 ResNet、Inception、Xception 这一类网络结构类似,都可以作为执行其他任务的网络结构的 base net,提取输入 image 的 feature maps,我尝试过 mobilenet_v2_style_unet、mobilenet_v2_style_deeplab_v3plus、mobilenet_v2_style_ssd,都是可以看到效果的。

Android 性能瓶颈

作为一个参考值,在 iPhone 7 Plus 上运行这个 mobilenet_v2_style_hed 网络并且执行后续的找点算法,FPS 可以跑到12,基本满足实时性的需求。但是当尝试在 Android 上部署的时候,即便是在高价位高配置的机型上,FPS 也很低,卡顿现象很明显。

经过排查,找到了一些线索。在 iPhone 7 Plus 上,计算量的分布如下图所示:

红框中的三种

操作占据了大部分的 CPU 时间,用这几个数值做一个粗略估算,1.0 / (32 + 30 + 10 + 6) = 12.8,这和检测到的 FPS 是比较吻合的,说明大量的计算时间都用在神经网络上了,OpenCV 实现的找点算法的耗时是很短的。

但是在 Android 上,情况则完全不一样了,如下图所示:

用红框里的数值计算一下,FPS = 1.0 / (232 + 76 + 29 + 16) = 2.8,达不到实时的要求。从上图还可以看出,在 Android 上,Batch Normalization 消耗了大量的计算时间,而且和 Conv2D 消耗的 CPU 时间相比,不在一个数量级上了,这就和 iOS 平台上完全不是同一种分布规律了。进一步 debug 后发现,我们 Android 平台的 app,由于一些历史原因被限定住了只能使用 32bit 的 .so 动态库,换成 64bit 的 TensorFlow 动态库在独立的 demo app 里面重新测量,mobilenet_v2_style_hed 在 Android 上的运行情况就和 iOS 的接近了,虽然还是比 iOS 慢,但是 CPU 耗时的统计数据是同一种分布规律了。

所以,性能瓶颈就在于 Batch Normalization 在 32bit 的 ARM CPU 环境中执行效率不高,尝试过使用一些编译器优化选项重新编译 32bit 的 TensorFlow 库,但是并没有明显的改善。最后的解决方案是退而求其次,使用 vgg_style_hed,并且不使用 Batch Normalization,经过这样的调整后,Android 上的统计数据如下图:

关于 TensorFlow Lite

在使用 TensorFlow 1.7 部署模型的时候,TensorFlow Lite 还未支持 transposed convolution,所以没有使用 TF Lite (目前 github 上已经看到有 Lite 版本的 transpose_conv.cc(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/kernels/transpose_conv.cc) 了) 了)。TensorFlow Lite 目前发展的很快,以后在选择部署方案的时候,TensorFlow Lite 是优先于 TensorFlow Mobile 的。

参考资料

xavier init

How to do Xavier initialization on TensorFlow

(https://stackoverflow.com/questions/33640581/how-to-do-xavier-initialization-on-tensorflow/36784797)

聊一聊深度学习的weight initialization

(https://zhuanlan.zhihu.com/p/25110150)

Batch Normalization

Understanding the backward pass through Batch Normalization Layer

(https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html)

机器学习里的黑色艺术:normalization, standardization, regularization

(https://zhuanlan.zhihu.com/p/29974820)

How could I use Batch Normalization in TensorFlow?

(https://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow)

add Batch Normalization immediately before non-linearity or after in Keras?

(https://github.com/keras-team/keras/issues/5465)

1x1 Convolution

What does 1x1 convolution mean in a neural network?

(https://stats.stackexchange.com/questions/194142/what-does-1x1-convolution-mean-in-a-neural-network.)

How are 1x1 convolutions the same as a fully connected layer?

(https://datascience.stackexchange.com/questions/12830/how-are-1x1-convolutions-the-same-as-a-fully-connected-layer)

One by One [ 1 x 1 ] Convolution - counter-intuitively useful

(https://iamaaditya.github.io/2016/03/one-by-one-convolution/)

Upsampling && Transposed Convolution

Upsampling and Image Segmentation with Tensorflow and TF-Slim

(http://warmspringwinds.github.io/tensorflow/tf-slim/2016/11/22/upsampling-and-image-segmentation-with-tensorflow-and-tf-slim/)

Image Segmentation using deconvolution layer in Tensorflow

(http://cv-tricks.com/image-segmentation/transpose-convolution-in-tensorflow/)

ResNet && Inception && Xception

Network In Network architecture: The beginning of Inception

(http://teleported.in/posts/network-in-network/)

ResNets, HighwayNets, and DenseNets, Oh My!

(https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

Inception modules: explained and implemented

(https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/)

TensorFlow implementation of the Xception Model by François Chollet

(https://github.com/kwotsin/TensorFlow-Xception)

TensorFlow Lite

TensorFlow Lite 深度解析

(http://developers.googleblog.cn/2018/06/tensorflow-lite-overview.html)

原文发布于微信公众号 - 腾讯Bugly(weixinBugly)

原文发表时间:2018-06-07

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏人工智能LeadAI

TensorFlow和Keras解决大数据量内存溢出问题

以前做的练手小项目导致新手产生一个惯性思维——读取训练集图片的时候把所有图读到内存中,然后分批训练。

2884
来自专栏磐创AI技术团队的专栏

使用Keras进行深度学习:(六)GRU讲解及实践

2723
来自专栏AI科技评论

开发 | TensorFlow中RNN实现的正确打开方式

上周写的文章《完全图解RNN、RNN变体、Seq2Seq、Attention机制》介绍了一下RNN的几种结构,今天就来聊一聊如何在TensorFlow中实现这些...

4835
来自专栏CDA数据分析师

教你一招:Python编写的最短路径算法

一心想学习算法,很少去真正静下心来去研究,前几天趁着周末去了解了最短路径的资料,用python写了一个最短路径算法。算法是基于带权无向图去寻找两个点之间的最短路...

30310
来自专栏AI研习社

如何在 Keras 中从零开始开发一个神经机器翻译系统?

机器翻译是一项具有挑战性的任务,包含一些使用高度复杂的语言知识开发的大型统计模型。 神经机器翻译的工作原理是——利用深层神经网络来解决机器翻译问题。 在本教程...

34212
来自专栏AI研习社

博客 | MNIST 数据集载入线性模型

这节开始我们使用知名的图片数据库 「THE MNIST DATABASE」 作为我们的图片来源,它的数据内容是一共七a万张 28×28 像素的手写数字图片,并被...

1395
来自专栏数说工作室

在SAS里玩穿越 | 【SAS Says·扩展篇】IML:5.穿越

【SAS Says·扩展篇】IML 分6集,回复【SASIML】查看全部: 入门 | SAS里的平行世界 函数 | 函数玩一玩 编程 | IML的条件与循环 ...

2697
来自专栏机器学习算法原理与实践

用gensim学习word2vec

    在word2vec原理篇中,我们对word2vec的两种模型CBOW和Skip-Gram,以及两种解法Hierarchical Softmax和Nega...

1883
来自专栏PaddlePaddle

转载|PaddleFluid和TensorFlow基本使用概念对比

介绍:Paddle Fluid 是用来让用户像 PyTorch 和 Tensorflow Eager Execution 一样执行程序。在这些系统中,不再有模型...

1432
来自专栏机器学习之旅

基于Tensorflow实现DeepFM前言网络结构代码部分

DeepFM,Ctr预估中的大杀器,哈工大与华为诺亚方舟实验室荣耀出品,算法工程师面试高频考题,有效的结合了神经网络与因子分解机在特征学习中的优点:同时提取到低...

1574

扫码关注云+社区