手机端运行卷积神经网络实现文档检测功能(二) -- 从 VGG 到 MobileNetV2 知识梳理(续)

从 MobileNet V1 到 MobileNet V2

ResNet、Inception、Xception 追求的目标,就是在达到更高的准确率的前提下,尽量在模型大小、模型运算速度、模型训练速度这几个指标之间找一个平衡点,如果在准确性上允许一定的损失,但是追求更小的模型和更快的速度,这就直接催生了 MobileNet 或类似的以手机端或嵌入式端为运行环境的网络结构的出现。

MobileNet V1

(https://arxiv.org/pdf/1704.04861.pdf)

和 MobileNet V2

(https://arxiv.org/pdf/1801.04381.pdf)

都是基于 Depthwise Separable Convolution 构建的卷积层(类似 Xception,但是并不是和 Xception 使用的 Separable Convolution 完全一致),这是它满足体积小、速度快的一个关键因素,另外就是精心设计和试验调优出来的层结构,下面就对照论文给出两个版本的代码实现。

MobileNet V1

MobileNet V1 的整体结构其实并没有特别复杂的地方,和 VGG 类似,层和层之间就是普通的串联型的结构,有区别的地方主要在于 layer 的内部,如下图所示:

这个图中没有用箭头表示数据的传递方向,但是只要对卷积神经网络有初步的经验,就能看出来数据是从上往下传递的,左图是标准的卷积层操作,类似于前面 HED 网络中 _vgg_conv2d 函数的结构(回想一下前面说过的 Batch Normalization 和 relu 先后顺序的话题,虽然 Batch Normalization 可以放到激活函数的后面,但是很多论文里面都还是习惯性的放在激活函数的前面,所以这里的代码也会严格的遵照论文中的方式),右侧的图相当于 separable convolution,但是在中间是有两次 Batch Normalization 的。

论文中用一张如下的表格来描述了整体结构:

下面是一份简单的代码实现:

def mobilenet_v1(inputs, alpha, is_training):
    if alpha not in [0.25, 0.50, 0.75, 1.0]:
        raise ValueError('alpha must be one of'
                         '`0.25`, `0.50`, `0.75` or `1.0` only.')

    filter_initializer = tf.contrib.layers.xavier_initializer()    
    def _conv2d(inputs, filters, kernel_size, stride, scope=''):
        with tf.variable_scope(scope):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = tf.nn.relu(outputs)        
        return outputs    
    def _mobilenet_v1_conv2d(inputs, 
                          pointwise_conv_filters, 
                          depthwise_conv_kernel_size,
                          stride, # stride is just for depthwise convolution
                          scope=''):
        with tf.variable_scope(scope):
            with tf.variable_scope('depthwise_conv'):                
                '''
                tf.layers Module 里面有一个 tf.layers.separable_conv2d 函数,
                但是它的内部调用流程是 depthwise convolution --> pointwise convolution --> activation func,
                而 MobileNet V1 风格的卷积层的内部调用流程应该是
                depthwise conv --> batch norm --> relu --> pointwise conv --> batch norm --> relu,
                所以需要用其他的手段组装出想要的调用流程,
                一种办法是使用 tf.nn.depthwise_conv2d,但是这个 API 比较底层,代码写起来很笨重。
                后来找到了另外一种可行的办法,借助 tf.contrib.layers.separable_conv2d 函数,
                tf.contrib.layers.separable_conv2d 的第二个参数 num_outputs 如果设置为 None,
                则只会调用内部的 depthwise conv2d 部分,而不执行 pointwise conv2d 部分。
                这样就可以组装出 MobileNet V1 需要的 layer 结构了。

                TensorFlow 提供了四种 API,都命名为 separable_conv2d,但是又存在各种细微的差别,
                有兴趣的读者可以自行阅读相关文档
                tf.contrib.layers.separable_conv2d [Aliases tf.contrib.layers.separable_convolution2d]
                VS
                tf.keras.backend.separable_conv2d
                VS
                tf.layers.separable_conv2d
                VS
                tf.nn.separable_conv2d
                '''
                outputs = tf.contrib.layers.separable_conv2d(
                            inputs,            
                            None, # ref https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.py
                            depthwise_conv_kernel_size,
                            depth_multiplier=1, # 按照论文的描述,这里设置成1
                            stride=(stride, stride),
                            padding='SAME',
                            activation_fn=None,
                            weights_initializer=filter_initializer,
                            biases_initializer=None)

                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)     
            with tf.variable_scope('pointwise_conv'):
                # 论文中 alpha 参数的含义,就是在每一层的 pointwise conv 的位置按比例缩小输出 channels 的数量
                pointwise_conv_filters = int(pointwise_conv_filters * alpha)
                outputs = tf.layers.conv2d(outputs,
                                        pointwise_conv_filters,
                                        (1, 1), 
                                        padding='same', 
                                        activation=None,
                                        use_bias=False, 
                                        kernel_initializer=filter_initializer)

                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)        
                return outputs    
    def _avg_pool2d(inputs, scope=''):
        inputs_shape = inputs.get_shape().as_list()        
        assert len(inputs_shape) == 4

        pool_height = inputs_shape[1]
        pool_width = inputs_shape[2]        
        with tf.variable_scope(scope):
            outputs = tf.layers.average_pooling2d(inputs,
                                      [pool_height, pool_width],
                                      strides=(1, 1),
                                      padding='valid')        
        return outputs    '''
    执行分类任务的网络结构,通常还可以作为实现其他任务的网络结构的 base architecture,
    为了方便代码复用,这里只需要实现出卷积层构成的主体部分,
    外部调用者根据各自的需求使用这里返回的 output 和 end_points。
    比如对于分类任务,按照如下方式使用这个函数

    image_height = 224
    image_width = 224
    image_channels = 3

    x = tf.placeholder(tf.float32, [None, image_height, image_width, image_channels])
    is_training = tf.placeholder(tf.bool, name='is_training')

    output, net = mobilenet_v1(x, 1.0, is_training)
    print('output shape is: %r' % (output.get_shape().as_list()))

    output = tf.layers.flatten(output)
    output = tf.layers.dense(output,
                        units=1024, # 1024 class
                        activation=None,
                        use_bias=True,
                        kernel_initializer=tf.contrib.layers.xavier_initializer())
    print('output shape is: %r' % (output.get_shape().as_list()))
    '''
    with tf.variable_scope('mobilenet', 'mobilenet', [inputs]):
        end_points = {}
        net = inputs 

        net = _conv2d(net, 32, [3, 3], stride=2, scope='block0')
        end_points['block0'] = net
        net = _mobilenet_v1_conv2d(net, 64, [3, 3], stride=1, scope='block1')
        end_points['block1'] = net

        net = _mobilenet_v1_conv2d(net, 128, [3, 3], stride=2, scope='block2')
        end_points['block2'] = net
        net = _mobilenet_v1_conv2d(net, 128, [3, 3], stride=1, scope='block3')
        end_points['block3'] = net

        net = _mobilenet_v1_conv2d(net, 256, [3, 3], stride=2, scope='block4')
        end_points['block4'] = net
        net = _mobilenet_v1_conv2d(net, 256, [3, 3], stride=1, scope='block5')
        end_points['block5'] = net

        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=2, scope='block6')
        end_points['block6'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block7')
        end_points['block7'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block8')
        end_points['block8'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block9')
        end_points['block9'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block10')
        end_points['block10'] = net
        net = _mobilenet_v1_conv2d(net, 512, [3, 3], stride=1, scope='block11')
        end_points['block11'] = net

        net = _mobilenet_v1_conv2d(net, 1024, [3, 3], stride=2, scope='block12')
        end_points['block12'] = net
        net = _mobilenet_v1_conv2d(net, 1024, [3, 3], stride=1, scope='block13')
        end_points['block13'] = net

        output = _avg_pool2d(net, scope='output')    
        return output, end_points

MobileNet V2

MobileNet V2 的改动就比较大了,首先引入了两种新的 layer 结构,如下图所示:

很明显的一个差异点,就是左边这种层结构引入了残差网络的手段,另外,这两种层结构中,在 depthwise convolution 之前又添加了一个 1x1 convolution 操作,在之前举得几个例子中,1x1 convolution 都是用来降维的,而在 MobileNet V2 里,这个位于 depthwise convolution 之前的 1x1 convolution 其实用来提升维度的,对应论文中 expansion factor 参数的含义,在 depthwise convolution 之后仍然还有一次 1x1 convolution 调用,但是这个 1x1 convolution 并不会跟随一个激活函数,只是一次线性变换,所以这里也不叫做 pointwise convolution,而是对应论文中的 1x1 projection convolution。

网络的整体结构由下面的表格描述:

代码实现如下:

def mobilenet_v2_func_blocks(is_training):
    filter_initializer = tf.contrib.layers.xavier_initializer()
    activation_func = tf.nn.relu6    
    def conv2d(inputs, filters, kernel_size, stride, scope=''):
        with tf.variable_scope(scope):            
            with tf.variable_scope('conv2d'):
                outputs = tf.layers.conv2d(inputs,
                                        filters, 
                                        kernel_size, 
                                        strides=(stride, stride),
                                        padding='same', 
                                        activation=None,
                                        use_bias=False, 
                                        kernel_initializer=filter_initializer)

                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)            
       return outputs    
    def _1x1_conv2d(inputs, filters, stride):
        kernel_size = [1, 1]        
        with tf.variable_scope('1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)            # no activation_func
        return outputs    
    def expansion_conv2d(inputs, expansion, stride):
        input_shape = inputs.get_shape().as_list()        
        assert len(input_shape) == 4
        filters = input_shape[3] * expansion

        kernel_size = [1, 1]        
        with tf.variable_scope('expansion_1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = activation_func(outputs)        
        return outputs    
    def projection_conv2d(inputs, filters, stride):
        kernel_size = [1, 1]        
        with tf.variable_scope('projection_1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs,
                                    filters, 
                                    kernel_size, 
                                    strides=(stride, stride),
                                    padding='same', 
                                    activation=None,
                                    use_bias=False, 
                                    kernel_initializer=filter_initializer)

            outputs = tf.layers.batch_normalization(outputs, training=is_training)            # no activation_func
        return outputs    def depthwise_conv2d(inputs, 
                        depthwise_conv_kernel_size,
                        stride):
        with tf.variable_scope('depthwise_conv2d'):
            outputs = tf.contrib.layers.separable_conv2d(
                        inputs,                        
                        None, # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.py
                        depthwise_conv_kernel_size,
                        depth_multiplier=1,
                        stride=(stride, stride),
                        padding='SAME',
                        activation_fn=None,
                        weights_initializer=filter_initializer,
                        biases_initializer=None) 

            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = activation_func(outputs)        
            return outputs    
    def avg_pool2d(inputs, scope=''):
        inputs_shape = inputs.get_shape().as_list()        
        assert len(inputs_shape) == 4

        pool_height = inputs_shape[1]
        pool_width = inputs_shape[2]
        with tf.variable_scope(scope):
            outputs = tf.layers.average_pooling2d(inputs,
                                            [pool_height, pool_width],
                                            strides=(1, 1),
                                            padding='valid')
       return outputs    
    def inverted_residual_block(inputs, 
                            filters, 
                            stride, 
                            expansion=6, 
                            scope=''):
        assert stride == 1 or stride == 2

        depthwise_conv_kernel_size = [3, 3]
        pointwise_conv_filters = filters        
    with tf.variable_scope(scope):
            net = inputs
            net = expansion_conv2d(net, expansion, stride=1)
            net = depthwise_conv2d(net, depthwise_conv_kernel_size, stride=stride)
            net = projection_conv2d(net, pointwise_conv_filters, stride=1)            
            if stride == 1:               
                # 如果 net.get_shape().as_list()[3] != inputs.get_shape().as_list()[3]
                # 借助一个 1x1 的卷积让他们的 channels 相等,然后才能相加
                if net.get_shape().as_list()[3] != inputs.get_shape().as_list()[3]:
                    inputs = _1x1_conv2d(inputs, net.get_shape().as_list()[3], stride=1)

                net = net + inputs                
                return net           
     else:                
         # stride == 2
          return net

    func_blocks = {}
    func_blocks['conv2d'] = conv2d
    func_blocks['inverted_residual_block'] = inverted_residual_block
    func_blocks['avg_pool2d'] = avg_pool2d
    func_blocks['filter_initializer'] = filter_initializer
    func_blocks['activation_func'] = activation_func    
    return func_blocks
def mobilenet_v2(inputs, is_training):
    func_blocks = mobilenet_v2_func_blocks(is_training)
    _conv2d = func_blocks['conv2d'] 
    _inverted_residual_block = func_blocks['inverted_residual_block']
    _avg_pool2d = func_blocks['avg_pool2d']    
    with tf.variable_scope('mobilenet_v2', 'mobilenet_v2', [inputs]):
        end_points = {}
        net = inputs 

        net = _conv2d(net, 32, [3, 3], stride=2, scope='block0_0') # size/2
        end_points['block0'] = net

        net = _inverted_residual_block(net, 16, stride=1, expansion=1, scope='block1_0')
        end_points['block1'] = net

        net = _inverted_residual_block(net, 24, stride=2, scope='block2_0') # size/4
        net = _inverted_residual_block(net, 24, stride=1, scope='block2_1')
        end_points['block2'] = net

        net = _inverted_residual_block(net, 32, stride=2, scope='block3_0') # size/8
        net = _inverted_residual_block(net, 32, stride=1, scope='block3_1') 
        net = _inverted_residual_block(net, 32, stride=1, scope='block3_2')
        end_points['block3'] = net

        net = _inverted_residual_block(net, 64, stride=2, scope='block4_0') # size/16
        net = _inverted_residual_block(net, 64, stride=1, scope='block4_1') 
        net = _inverted_residual_block(net, 64, stride=1, scope='block4_2') 
        net = _inverted_residual_block(net, 64, stride=1, scope='block4_3') 
        end_points['block4'] = net

        net = _inverted_residual_block(net, 96, stride=1, scope='block5_0') 
        net = _inverted_residual_block(net, 96, stride=1, scope='block5_1')
        net = _inverted_residual_block(net, 96, stride=1, scope='block5_2')
        end_points['block5'] = net

        net = _inverted_residual_block(net, 160, stride=2, scope='block6_0') # size/32
        net = _inverted_residual_block(net, 160, stride=1, scope='block6_1') 
        net = _inverted_residual_block(net, 160, stride=1, scope='block6_2') 
        end_points['block6'] = net

        net = _inverted_residual_block(net, 320, stride=1, scope='block7_0')
        end_points['block7'] = net

        net = _conv2d(net, 1280, [1, 1], stride=1, scope='block8_0') 
        end_points['block8'] = net

        output = _avg_pool2d(net, scope='output')    
    return output, end_points

MobileNet V2 Style HED

原始的 HED 使用 VGG 作为基础网络结构来得到 feature maps,参照这种思路,可以把基础网络部分替换为 MobileNet V2,代码如下:

def mobilenet_v2_style_hed(inputs, batch_size, is_training):
    if const.use_kernel_regularizer:
        weights_regularizer = tf.contrib.layers.l2_regularizer(scale=0.0001)    else:
        weights_regularizer = None

    ####################################################
    func_blocks = mobilenet_v2_func_blocks(is_training)    
    # print('============ func_blocks are: %r' % func_blocks)
    _conv2d = func_blocks['conv2d'] 
    _inverted_residual_block = func_blocks['inverted_residual_block']
    _avg_pool2d = func_blocks['avg_pool2d']
    filter_initializer = func_blocks['filter_initializer']
    activation_func = func_blocks['activation_func']    
    ####################################################

    def _dsn_1x1_conv2d(inputs, filters):
        kernel_size = [1, 1]
        outputs = tf.layers.conv2d(inputs,
                                   filters,
                                   kernel_size, 
                                   padding='same', 
                                   activation=None, ## no activation
                                   use_bias=False, 
                                   kernel_initializer=filter_initializer,
                                   kernel_regularizer=weights_regularizer)

        outputs = tf.layers.batch_normalization(outputs, training=is_training)        
        ## no activation
        return outputs    
    def _output_1x1_conv2d(inputs, filters):
        kernel_size = [1, 1]
        outputs = tf.layers.conv2d(inputs,
                                   filters,
                                   kernel_size, 
                                   padding='same', 
                                   activation=None, ## no activation
                                   use_bias=True, ## use bias
                                   kernel_initializer=filter_initializer,
                                   kernel_regularizer=weights_regularizer)        
        ## no batch normalization
        ## no activation

        return outputs    
    def _dsn_deconv2d_with_upsample_factor(inputs, filters, upsample_factor):
        ## https://github.com/s9xie/hed/blob/master/examples/hed/train_val.prototxt
        ## 从这个原版代码里看,是这样计算 kernel_size 的
        kernel_size = [2 * upsample_factor, 2 * upsample_factor]
        outputs = tf.layers.conv2d_transpose(inputs,
                                             filters, 
                                             kernel_size, 
                                             strides=(upsample_factor, upsample_factor), 
                                             padding='same', 
                                             activation=None, ## no activation
                                             use_bias=True, ## use bias
                                             kernel_initializer=filter_initializer,
                                             kernel_regularizer=weights_regularizer)        
        ## 概念上来说,deconv2d 已经是最后的输出 layer 了,只不过最后还有一步 1x1 的 conv2d 把 5 个 deconv2d 的输出再融合到一起
        ## 所以不需要再使用 batch normalization 了

        return outputs    
    with tf.variable_scope('hed', 'hed', [inputs]):
        end_points = {}
        net = inputs        
        ## mobilenet v2 as base net
        with tf.variable_scope('mobilenet_v2'):            
            # 标准的 mobilenet v2 里面并没有这两层,
            # 这里是为了得到和 input image 相同 size 的 feature map 而增加的层
            net = _conv2d(net, 3, [3, 3], stride=1, scope='block0_0')
            net = _conv2d(net, 6, [3, 3], stride=1, scope='block0_1')

            dsn1 = net
            net = _conv2d(net, 12, [3, 3], stride=2, scope='block0_2') # size/2

            net = _inverted_residual_block(net, 6, stride=1, expansion=1, scope='block1_0')

            dsn2 = net
            net = _inverted_residual_block(net, 12, stride=2, scope='block2_0') # size/4
            net = _inverted_residual_block(net, 12, stride=1, scope='block2_1')

            dsn3 = net
            net = _inverted_residual_block(net, 24, stride=2, scope='block3_0') # size/8
            net = _inverted_residual_block(net, 24, stride=1, scope='block3_1') 
            net = _inverted_residual_block(net, 24, stride=1, scope='block3_2')

            dsn4 = net
            net = _inverted_residual_block(net, 48, stride=2, scope='block4_0') # size/16
            net = _inverted_residual_block(net, 48, stride=1, scope='block4_1') 
            net = _inverted_residual_block(net, 48, stride=1, scope='block4_2') 
            net = _inverted_residual_block(net, 48, stride=1, scope='block4_3') 

            net = _inverted_residual_block(net, 64, stride=1, scope='block5_0') 
            net = _inverted_residual_block(net, 64, stride=1, scope='block5_1')
            net = _inverted_residual_block(net, 64, stride=1, scope='block5_2')

            dsn5 = net        
        ## dsn layers
        with tf.variable_scope('dsn1'):
            dsn1 = _dsn_1x1_conv2d(dsn1, 1)    
                
        ## no need deconv2d
        with tf.variable_scope('dsn2'):
            dsn2 = _dsn_1x1_conv2d(dsn2, 1)
            dsn2 = _dsn_deconv2d_with_upsample_factor(dsn2, 1, upsample_factor = 2)        with tf.variable_scope('dsn3'):
            dsn3 = _dsn_1x1_conv2d(dsn3, 1)
            dsn3 = _dsn_deconv2d_with_upsample_factor(dsn3, 1, upsample_factor = 4)        with tf.variable_scope('dsn4'):
            dsn4 = _dsn_1x1_conv2d(dsn4, 1)
            dsn4 = _dsn_deconv2d_with_upsample_factor(dsn4, 1, upsample_factor = 8)        with tf.variable_scope('dsn5'):
            dsn5 = _dsn_1x1_conv2d(dsn5, 1)
            dsn5 = _dsn_deconv2d_with_upsample_factor(dsn5, 1, upsample_factor = 16)        # dsn fuse
        with tf.variable_scope('dsn_fuse'):
            dsn_fuse = tf.concat([dsn1, dsn2, dsn3, dsn4, dsn5], 3)
            dsn_fuse = _output_1x1_conv2d(dsn_fuse, 1)    
    return dsn_fuse, dsn1, dsn2, dsn3, dsn4, dsn5

这个 MobileNet V2 风格的 HED 网络,整体结构和 VGG 风格的 HED 并没有区别,只是把 VGG 里面用到的卷积层操作替换成了 MobileNet V2 对应的卷积层,另外,因为 MobileNet V2 的第一个卷积层就设置了 stride=2,并不匹配 dsn1 层的 size,所以额外添加了两个 stride=1 的普通卷积层,把它们的输出作为 dsn1 层。

MobileNet V2 As Base Net

MobileNet 只是针对手机运行环境设计出来的执行 分类任务 的网络结构,但是,和同样执行分类任务的 ResNet、Inception、Xception 这一类网络结构类似,都可以作为执行其他任务的网络结构的 base net,提取输入 image 的 feature maps,我尝试过 mobilenet_v2_style_unet、mobilenet_v2_style_deeplab_v3plus、mobilenet_v2_style_ssd,都是可以看到效果的。

Android 性能瓶颈

作为一个参考值,在 iPhone 7 Plus 上运行这个 mobilenet_v2_style_hed 网络并且执行后续的找点算法,FPS 可以跑到12,基本满足实时性的需求。但是当尝试在 Android 上部署的时候,即便是在高价位高配置的机型上,FPS 也很低,卡顿现象很明显。

经过排查,找到了一些线索。在 iPhone 7 Plus 上,计算量的分布如下图所示:

红框中的三种

操作占据了大部分的 CPU 时间,用这几个数值做一个粗略估算,1.0 / (32 + 30 + 10 + 6) = 12.8,这和检测到的 FPS 是比较吻合的,说明大量的计算时间都用在神经网络上了,OpenCV 实现的找点算法的耗时是很短的。

但是在 Android 上,情况则完全不一样了,如下图所示:

用红框里的数值计算一下,FPS = 1.0 / (232 + 76 + 29 + 16) = 2.8,达不到实时的要求。从上图还可以看出,在 Android 上,Batch Normalization 消耗了大量的计算时间,而且和 Conv2D 消耗的 CPU 时间相比,不在一个数量级上了,这就和 iOS 平台上完全不是同一种分布规律了。进一步 debug 后发现,我们 Android 平台的 app,由于一些历史原因被限定住了只能使用 32bit 的 .so 动态库,换成 64bit 的 TensorFlow 动态库在独立的 demo app 里面重新测量,mobilenet_v2_style_hed 在 Android 上的运行情况就和 iOS 的接近了,虽然还是比 iOS 慢,但是 CPU 耗时的统计数据是同一种分布规律了。

所以,性能瓶颈就在于 Batch Normalization 在 32bit 的 ARM CPU 环境中执行效率不高,尝试过使用一些编译器优化选项重新编译 32bit 的 TensorFlow 库,但是并没有明显的改善。最后的解决方案是退而求其次,使用 vgg_style_hed,并且不使用 Batch Normalization,经过这样的调整后,Android 上的统计数据如下图:

关于 TensorFlow Lite

在使用 TensorFlow 1.7 部署模型的时候,TensorFlow Lite 还未支持 transposed convolution,所以没有使用 TF Lite (目前 github 上已经看到有 Lite 版本的 transpose_conv.cc(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/kernels/transpose_conv.cc) 了) 了)。TensorFlow Lite 目前发展的很快,以后在选择部署方案的时候,TensorFlow Lite 是优先于 TensorFlow Mobile 的。

参考资料

xavier init

How to do Xavier initialization on TensorFlow

(https://stackoverflow.com/questions/33640581/how-to-do-xavier-initialization-on-tensorflow/36784797)

聊一聊深度学习的weight initialization

(https://zhuanlan.zhihu.com/p/25110150)

Batch Normalization

Understanding the backward pass through Batch Normalization Layer

(https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html)

机器学习里的黑色艺术:normalization, standardization, regularization

(https://zhuanlan.zhihu.com/p/29974820)

How could I use Batch Normalization in TensorFlow?

(https://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow)

add Batch Normalization immediately before non-linearity or after in Keras?

(https://github.com/keras-team/keras/issues/5465)

1x1 Convolution

What does 1x1 convolution mean in a neural network?

(https://stats.stackexchange.com/questions/194142/what-does-1x1-convolution-mean-in-a-neural-network.)

How are 1x1 convolutions the same as a fully connected layer?

(https://datascience.stackexchange.com/questions/12830/how-are-1x1-convolutions-the-same-as-a-fully-connected-layer)

One by One [ 1 x 1 ] Convolution - counter-intuitively useful

(https://iamaaditya.github.io/2016/03/one-by-one-convolution/)

Upsampling && Transposed Convolution

Upsampling and Image Segmentation with Tensorflow and TF-Slim

(http://warmspringwinds.github.io/tensorflow/tf-slim/2016/11/22/upsampling-and-image-segmentation-with-tensorflow-and-tf-slim/)

Image Segmentation using deconvolution layer in Tensorflow

(http://cv-tricks.com/image-segmentation/transpose-convolution-in-tensorflow/)

ResNet && Inception && Xception

Network In Network architecture: The beginning of Inception

(http://teleported.in/posts/network-in-network/)

ResNets, HighwayNets, and DenseNets, Oh My!

(https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

Inception modules: explained and implemented

(https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/)

TensorFlow implementation of the Xception Model by François Chollet

(https://github.com/kwotsin/TensorFlow-Xception)

TensorFlow Lite

TensorFlow Lite 深度解析

(http://developers.googleblog.cn/2018/06/tensorflow-lite-overview.html)

原文发布于微信公众号 - 腾讯Bugly(weixinBugly)

原文发表时间:2018-06-07

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏人工智能LeadAI

Tensorflow动态seq2seq使用总结

tf-seq2seq是Tensorflow的通用编码器 - 解码器框架,可用于机器翻译,文本汇总,会话建模,图像字幕等。 动机 其实差不多半年之前就想吐槽Ten...

7859
来自专栏AI研习社

TensorFlow 中 RNN 实现的正确打开方式

上周写的文章《完全图解 RNN、RNN 变体、Seq2Seq、Attention 机制》介绍了一下 RNN 的几种结构,今天就来聊一聊如何在 TensorFlo...

3898
来自专栏WOLFRAM

五形相生

在三维欧氏空间里,有且仅有五种正多面体:正四面体、立方体、正八面体、正十二面体、正二十面体。一般介绍正多面体的文献中,只会强调立方体和正四面体互为对偶,正十二面...

754
来自专栏人工智能

用Pandas在Python中可视化机器学习数据

您必须了解您的数据才能从机器学习算法中获得最佳结果。

3066
来自专栏编程

用Python进行速度预测

这次分享一段数据特征挖掘准备工作的套路~ 数据格式是这样的: ? task 预测值:速度 特征值: Region 区域 Length 长度Volume 流...

2329
来自专栏生信宝典

SOM基因表达聚类分析初探

上周的暑期生信黑马培训有老师提出要做SOM分析,最后卡在code plot只能出segment plot却出不来line plot。查了下,没看到解决方案。今天...

742
来自专栏计算机视觉life

自识别标记(self-identifying marker) -(5) 用于相机标定的CALTag图案设计

前面介绍了CALTag的工作原理、应用领域。如果我们想在实际项目中应用自识别标记,通常需要根据项目的特点来设计不同尺寸,不同数目,不同排列的图案,那么如何设计属...

1917
来自专栏机器学习之旅

基于Tensorflow实现DeepFM前言网络结构代码部分

DeepFM,Ctr预估中的大杀器,哈工大与华为诺亚方舟实验室荣耀出品,算法工程师面试高频考题,有效的结合了神经网络与因子分解机在特征学习中的优点:同时提取到低...

684
来自专栏逸鹏说道

验证码识别,发票编号识别

毕业设计做了一个简单的研究下验证码识别的问题,并没有深入的研究,设计图形图像的东西,水很深,神经网络,机器学习,都很难。这次只是在传统的方式下分析了一次。 今...

4029
来自专栏PPV课数据科学社区

【学习】R语言18讲(五)

? 上篇讲到了数据挖掘的质量分析,主要是对数据缺失情况,准确情况,以及数据集的结构进行探索,接下来,我们就要对数据的特征进行分析了,所谓数据的特征分析就是指数...

30811

扫码关注云+社区