首页
学习
活动
专区
工具
TVP
发布
精选内容/技术社群/优惠产品,尽在小程序
立即前往

程序员1小时完成深度学习Resnet,谷歌tensorflow多次图像大赛冠军

阅前须知:

为了使本文结构精简,理解简单,所以会尽量少涉及到有关数学公式,降低学习门槛,带领读者快速搭建ResNet-34经典模型并投入训练。

编译环境:Python3.6

TensorFlow-gpu 1.5.0

Pycharm

数 据 集:MNIST

一、结构分析

关于ResNet的来源我就不进行赘述了,相信读者都对其或多或少有一定的了解,这个包揽各大图像识别赛事冠军的模型究竟有什么与众不同?

说起卷积模型,LeNet、Inception、Vgg都是我们在学习图像识别领域神经网络的经典模型,以上图片模型就是经典的Vgg-19与Plain-34、ResNet-34的对比。

从计算量上来讲,Vgg-19的三层全连接神经网络的计算量明显大于plain和resnet,plain和resnet的参数数量相同

从训练拟合度上讲,论文中分别给出了plain-18、plain-34和resnet-18、resnet-34的对比,我们不难发现plain随着层数的增加,精度并没有得到明显的提升,而resnet不仅随着层数的增加提高了训练精度,且相较同深度的plain而言精度更高

在以往的学习之中,我们知道深度网络随着层数的增加,很容易造成“退化”和“梯度消失”的问题,训练数据的过拟合。但在ResNet中,作者给出了一种解决方案:增加一个identity mapping(恒等映射,由于本文面向读者基础不同,就不加以详述,有能力的同学可以看一下ResNet作者的论文)

上图是一个残差模块的结构示意,残差块想要有效果需要有两层或两层以上的layer,同时,输入x与输出F(x)的维度也须相同

在对于高于50层深度的resnet模型中,为了进一步减少计算量且保证模型精度,作者对残差块进行了优化,将内部两层3*3layer换成1*1 3*3 1*1,。首先采用1*1卷积进行深度降维,减少残差模块在深度上的计算量,第二层3*3layer和之前的模块功能一样,提取图像特征,第三层1*1layer用于维度还原。

那么问题又来了,既然已经经过了3*3卷积,那输出维度怎么会一样呢?作者在论文中给出了三种解决方案:

1、维度不足部分全0填充

2、输入输出维度一致时使用恒等映射,不一致时使用线性投影

3、对于所有的block均使用线性投影。

在本文中,我们对模型主要采用全0填充。

好,以上就是简单的理论入门,接下来我们开始着手用TensorFlow对理论进行代码实现

二、代码实现(ResNet-34)

参数设定(DATA_set.py)

NUM_LABELS = 10 #对比标签数量(模型输出通道)

#卷积参数

CONV_SIZE = 3

CONV_DEEP = 64

#学习优化参数

BATCH_SIZE = 100

LEARNING_RATE_BASE = 0.03

LEARNING_RATE_DECAY = 0.99

REGULARIZATION_RATE = 0.0001

TRAINING_STEPS = 8000

MOVING_AVERAGE_DECAY = 0.99

#图片信息

IMAGE_SIZE = 28

IMAGE_COLOR_DEPH = 1

#模型保存位置

MODEL_SAVE_PATH="MNIST_model/"

MODEL_NAME="mnist_model"

#日志路径

LOG_PATH = "log"

定义传递规则(ResNet_infernece.py)

import tensorflow as tf

import DATA_set.py

#双层残差模块

def res_layer2d(input_tensor,

kshape = [5,5],

deph = 64,

conv_stride = 1,

padding='SAME'):

data = input_tensor

#模块内部第一层卷积

data = slim.conv2d(data,

num_outputs=deph,

kernel_size=kshape,

stride=conv_stride,

padding=padding)

#模块内部第二层卷积

data = slim.conv2d(data,

num_outputs=deph,

kernel_size=kshape,

stride=conv_stride,

padding=padding,

activation_fn=None)

output_deep = input_tensor.get_shape().as_list()[3]

#当输出深度和输入深度不相同时,进行对输入深度的全零填充

if output_deep != deph:

input_tensor = tf.pad(input_tensor,

[[0, 0], [0, 0], [0, 0],

[abs(deph-output_deep)//2,

abs(deph-output_deep)//2] ])

data = tf.add(data,input_tensor)

return data

#模型在增加深度的同时,为了减少计算量进行的xy轴降维(下采样),

#这里用卷积1*1,步长为2。当然也可以用max_pool进行下采样,效果是一样的

def get_half(input_tensor,deph):

data = input_tensor

data = slim.conv2d(data,deph//2,1,stride = 2)

return data

#组合同类残差模块

def res_block(input_tensor,

kshape,deph,layer = 0,

half = False,name = None):

data = input_tensor

with tf.variable_scope(name):

if half:

data = get_half(data,deph//2)

for i in range(layer//2):

data = res_layer2d(input_tensor = data,deph = deph,kshape = kshape)

return data

#定义模型传递流程

def inference(input_tensor, train = False, regularizer = None):

with slim.arg_scope([slim.conv2d,slim.max_pool2d],stride = 1,padding = 'SAME'):

with tf.variable_scope("layer1-initconv"):

data = slim.conv2d(input_tensor,DATA_set.CONV_DEEP , [7, 7])

data = slim.max_pool2d(data,[2,2],stride=2)

with tf.variable_scope("resnet_layer"):

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP,layer = 6,half = False,

name = "layer4-9-conv")

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP * 2,layer = 8,half = True,

name = "layer10-15-conv")

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP * 4,layer = 12,half = True,

name = "layer16-27-conv")

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP * 8,layer = 6,half = True,

name = "layer28-33-conv")

data = slim.avg_pool2d(data,[2,2],stride=2)

#得到输出信息的维度,用于全连接层的输入

data_shape = data.get_shape().as_list()

nodes = data_shape[1] * data_shape[2] * data_shape[3]

reshaped = tf.reshape(data, [data_shape[0], nodes])

#最后全连接层

with tf.variable_scope('layer34-fc'):

fc_weights = tf.get_variable("weight", [nodes, DATA_set.NUM_LABELS],

initializer=tf.truncated_normal_initializer(stddev=0.1))

if regularizer != None: tf.add_to_collection('losses', regularizer(fc_weights))

fc_biases = tf.get_variable("bias", [DATA_set.NUM_LABELS],

initializer=tf.constant_initializer(0.1))

if train:

return fc

模型训练(MNIST_train.py)

import tensorflowas tf

import ResNet_infernece

import DATA_set.py

import os

import numpy as np

#定义损失函数、学习率、滑动平均操作

#基础范围,本文不加以阐释

def train_op_data(mnist,lables,output,

moving_average_decay,learning_rate_base,

batch_size,learning_rate_daecay,global_step):

variable_averages =

tf.train.ExponentialMovingAverage(moving_average_decay,

global_step)

variables_averages_op = variable_averages.apply(tf.trainable_variables())

labels=tf.argmax(lables, 1))

cross_entropy_mean = tf.reduce_mean(cross_entropy)

loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))

global_step,

learning_rate_daecay, staircase=True)

train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,

global_step=global_step)

with tf.control_dependencies([train_step, variables_averages_op]):

train_op = tf.no_op(name='train')

return train_op , loss

def train(mnist):

# 定义数据入口和标注

input = tf.placeholder(tf.float32,

[ DATA_set.BATCH_SIZE, DATA_set.IMAGE_SIZE,

DATA_set.IMAGE_SIZE, DATA_set.IMAGE_COLOR_DEPH],

name='input')

y_ = tf.placeholder(tf.float32,

[None, DATA_set.NUM_LABELS],

name='y-input')

#获取结果

y = ResNet_infernece.inference(input, False, regularizer)

#全局行为

global_step = tf.Variable(0, trainable=False)

#进行模型优化

train_op , loss = train_op_data(mnist,y_,y,

DATA_set.MOVING_AVERAGE_DECAY,

DATA_set.LEARNING_RATE_BASE,

DATA_set.BATCH_SIZE,

DATA_set.LEARNING_RATE_DECAY,

global_step)

# 初始化TensorFlow持久化类,并对模型进行训练

saver = tf.train.Saver()

write = tf.summary.FileWriter(DATA_set.LOG_PATH,tf.get_default_graph())

write.close()

with tf.Session() as sess:

tf.global_variables_initializer().run()

for i in range(DATA_set.TRAINING_STEPS):

reshaped_xs = np.reshape(xs, ( DATA_set. BATCH_SIZE,

DATA_set.IMAGE_SIZE,

DATA_set.IMAGE_SIZE,

DATA_set.IMAGE_COLOR_DEPH))

_, loss_value, step = sess.run([train_op, loss, global_step],

feed_dict=)

print("After %d training step(s), loss on training batch is %g."

% (step, loss_value))

if i % 100 == 0:

DATA_set.MODEL_NAME),

global_step=global_step)

def main(argv=None):

mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

train(mnist)

if __name__ == '__main__': tf.app.run()

参考控制台输出:

After 1 training step(s), loss on training batch is 2.30636.

After 2 training step(s), loss on training batch is 2.30597.

After 3 training step(s), loss on training batch is 2.30568.

After 4 training step(s), loss on training batch is 2.30372.

After 5 training step(s), loss on training batch is 2.30359.

.

.

.

After 4070 training step(s), loss on training batch is 0.0895806.

After 4072 training step(s), loss on training batch is 0.0293886.

After 4073 training step(s), loss on training batch is 0.0106446.

After 4074 training step(s), loss on training batch is 0.0337671.

After 4075 training step(s), loss on training batch is 0.0660308.

After 4078 training step(s), loss on training batch is 0.0306475.

After 4079 training step(s), loss on training batch is 0.0532619.

.

.

.

想要源码和作者交流的,请看扩展链接。或者关注头条号,给我们留言。吧

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20180203A0AS3M00?refer=cp_1026
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长 进交流群

领取专属 10元无门槛券

私享最新 技术干货

扫码加入开发者社群
领券