人脸识别网络Inception代码解析

文章来源：企鹅号 - 戏说AI大数据

即日起，本公众号定期推送专栏原创文章，邀请技术大V强势加盟，传播最专业的技术干货。欢迎关注！

由于TensorFlow还有很多基础，感觉挂一漏万，所以在这里对2.2章先略过，等想到一个好的方式再来讲。

本章准备对Inception网络代码进行解析，Inception网络可以完成对于人脸的分类工作。其实总结起来人脸识别工作可以分为几个部分人脸检测（detection），人脸校准（aliment），人脸识别（recognise），还记得手机照相过程那个小框框吧，那个就属于人脸检测，其用的是MTCNN，名字比较唬人哈，其实就是三个简单的卷积神经网络综合起来的（图形节选自文章Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks ）：

NMS极大值抑制就指的是将好多小框框进行合并，因为在Pnet中是纯卷积结构，这个结构使得可以对任意大小的图形进行识别，识别之后就是很多的框框，由于训练过程人脸大小不一，所以在识别过程中需要建立所谓的“图像金字塔”，剩下两层是卷积和全链接结构，用于人脸校正，其网络结构如下：

再次说明，上面两个图形节选自文章(Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks )

人脸检测就是对图片中包含的人脸进行识别，这是一个相对简单的任务，也为后续的人脸身份识别提供基础，那么人脸身份识别对应的就是FaceRecognise，这部分是本文的主要内容，Inception网络的目的就是对人脸进行分类，其他分类结构包括Resnet等这是一个完全不同的结构，网络很深，一百多层。

TensorFlow中是包含完整的Inception网络的，对于高层次的api大都在contrib库中：

slim库包含了很多现成的网络，这免去了我们自己搭建的麻烦。

我们直接打开inception部分代码，就可以在注释中发现inception的网络结构：

前几层中是卷积以及池化层，这与传统的卷积神经网络并没有什么不同。之后是mixed层，其原理上来说也是几个卷积层的综合，但是细节结构上做了一些优化，使得参数更少。

首先来看对于上面定义的网络结构是如何用TensorFlow写成的：

对于卷积层来说：

withvariable_scope.variable_scope(scope, 'InceptionV3', [inputs]):

with arg_scope(

[layers.conv2d,layers_lib.max_pool2d, layers_lib.avg_pool2d],

stride=1,

padding='VALID'):

# 299 x 299 x 3

end_point = 'Conv2d_1a_3x3'

net = layers.conv2d(inputs,depth(32), [3, 3], stride=2, scope=end_point)

end_points[end_point] = net

if end_point ==final_endpoint:

return net, end_points

# 149 x 149 x 32

end_point = 'Conv2d_2a_3x3'

net = layers.conv2d(net,depth(32), [3, 3], scope=end_point)

end_points[end_point] = net

if end_point ==final_endpoint:

return net, end_points

# 147 x 147 x 32

end_point = 'Conv2d_2b_3x3'

net = layers.conv2d(

net, depth(64), [3, 3],padding='SAME', scope=end_point)

end_points[end_point] = net

if end_point ==final_endpoint:

return net, end_points

# 147 x 147 x 64

还记得曾经我们如何构建卷积层的吗？

"""

Define the conf layer

"""

weight=tf.get_variable("conv_weight", weigh_shape,

initializer=tf.random_normal_initializer())

biases=tf.get_variable("conv_bias", bias_shape,

initializer=tf.constant_initializer(0.0))

padding='VALID', name="conv_data")

returnacitv(conv+biases)

显然对于编程来说是非常麻烦的，因此在TensorFlow中定义了layers：

fromtensorflow.contribimportlayers

layers.conv2d(...)

#或者

slim.conv2d(...)

它使得我们对于神经网络的构建得以很大程度的简化。

而对于inception层，其是一系列卷积、concat操作的小合集：

end_point='Mixed_5b'

withvariable_scope.variable_scope(end_point):

withvariable_scope.variable_scope('Branch_0'):

branch_0=layers.conv2d(

net, depth(64), [1,1], scope='Conv2d_0a_1x1')

withvariable_scope.variable_scope('Branch_1'):

branch_1=layers.conv2d(

net, depth(48), [1,1], scope='Conv2d_0a_1x1')

branch_1=layers.conv2d(

branch_1, depth(64), [5,5], scope='Conv2d_0b_5x5')

withvariable_scope.variable_scope('Branch_2'):

branch_2=layers.conv2d(

net, depth(64), [1,1], scope='Conv2d_0a_1x1')

branch_2=layers.conv2d(

branch_2, depth(96), [3,3], scope='Conv2d_0b_3x3')

branch_2=layers.conv2d(

branch_2, depth(96), [3,3], scope='Conv2d_0c_3x3')

withvariable_scope.variable_scope('Branch_3'):

branch_3=layers_lib.avg_pool2d(net,[3,3], scope='AvgPool_0a_3x3')

branch_3=layers.conv2d(

branch_3, depth(32), [1,1], scope='Conv2d_0b_1x1')

net=array_ops.concat([branch_0,branch_1, branch_2, branch_3],3)

end_points[end_point]=net

ifend_point==final_endpoint:

returnnet, end_points

其描述的网络为：

由于大的卷积核对于计算的消耗较大，所以可以用深层-小卷积核替代：

两幅图均节选自文章（http://arxiv.org/abs/1512.00567）

在叠加了多个inception层后，整个inception网络构建完成。实际上整个：

def inception_v3(inputs,

num_classes=1000,

is_training=True,

dropout_keep_prob=0.8,

min_depth=16,

depth_multiplier=1.0,

prediction_fn=layers_lib.softmax,

spatial_squeeze=True,

reuse=None,

scope='InceptionV3'):

....

就是返回Inception网络模型，在人脸识别的实践中可以直接引入。

发表于: 2018-03-312018-03-31 07:25:10
原文链接：http://kuaibao.qq.com/s/20180331G06L7F00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

人脸识别网络Inception代码解析

相关快讯

扫码

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐