关于卷积神经网络理论的学习,可以看:卷积神经网络。
本节学习来源斯坦福大学cs20课程,有关本节源代码已同步只至github,欢迎大家star与转发,收藏!
cs20是一门对于深度学习研究者学习Tensorflow的课程,今天学习第 六与七节,非常有收获,并且陆续将内容写入jupytebook notebook中,有关这个源代码及仓库地址,大家可以点击阅读原文或者直接复制下面链接!
直通车: https://github.com/Light-City/Translating_documents
在TensorFlow中去做卷积,我们有很多内建的层可以使用。你可以输入2维数据做1维卷积,输入3维数据做2维卷积,输入4维数据做3维卷积,最常用的是2维卷积。
# 函数模型
tf.nn.conv2d(
input,
filter,
strides,
padding,
use_cudnn_on_gpu=True,
data_format='NHWC',
dilations=[1, 1, 1, 1],
name=None)
Input: Batch size (N) x Height (H) x Width (W) x Channels (C)
Filter: Height x Width x Input Channels x Output Channels
(e.g. [5, 5, 3, 64])
Strides: 4 element 1-D tensor, strides in each direction
(often [1, 1, 1, 1] or [1, 2, 2, 1])
Padding: 'SAME' or 'VALID'
Dilations: The dilation factor. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension.
Data_format: default to NHWC
作一个有趣的练习:在上面GitHub中的kernes.py文件中看到一些著名的核的值,在07_run_kernels.py中看到它们的用法。
在第三课中学习了逻辑回归处理MNIST,现在我们使用CNN来处理,看看结果如何!
将采用如下架构:两个步长为1的卷积层,每个卷积层后跟一个relu激活层与最大池化层Maxpool,最后跟两个全连接层。
在定义函数之前,让我们看一下获取输出大小的公式。当您具有上述输入值时,输出的大小如下所示:
在我们的MNIST模型中,输入为28x28,滤波器为5x5。并且步幅使用1和填充使用2。因此,输出的大小如下:
def conv_relu(inputs, filters, k_size, stride, padding, scope_name):
with tf.variable_scope(scope_name, reuse=tf.AUTO_REUSE) as scope:
# rgb通道
in_channels = inputs.shape[-1]
# 卷积核
kernel = tf.get_variable('kernel', [k_size, k_size, in_channels, filters],
initializer=tf.truncated_normal_initializer())
biases = tf.get_variable('biases', [filters],
initializer=tf.random_normal_initializer())
# 卷积结果
conv = tf.nn.conv2d(inputs, kernel, strides=[1, stride, stride, 1], padding=padding)
# relu层对卷积结果处理
return tf.nn.relu(conv + biases, name=scope.name)
池化可减少要素图的维数,提取要素并缩短执行时间。
通常使用max-pooling或average-pooling。
由于在此模型中使用了max-pooling,因此我们定义了max-pooling函数,如下所示:
在我们的模型中,输入是28x28,池大小是2x2,补长是2,零填充,所以我们将输出大小如下。
def maxpool(inputs, ksize, stride, padding='VALID', scope_name='pool'):
with tf.variable_scope(scope_name, reuse=tf.AUTO_REUSE) as scope:
pool = tf.nn.max_pool(inputs,
ksize=[1, ksize, ksize, 1],
strides=[1, stride, stride, 1],
padding=padding)
return pool
def fully_connected(inputs, out_dim, scope_name='fc'):
with tf.variable_scope(scope_name, reuse=tf.AUTO_REUSE) as scope:
in_dim = inputs.shape[-1]
w = tf.get_variable('weights', [in_dim, out_dim],
initializer=tf.truncated_normal_initializer())
b = tf.get_variable('biases', [out_dim],
initializer=tf.constant_initializer(0.0))
out = tf.matmul(inputs, w) + b
return out
现在让我们通过组合我们创建的函数来创建整个模型。您可以使用我们按顺序创建的功能。
需要注意的一点是,当您在最后一次池化后转到fc层时,必须通过将一维向量的大小乘以原始数组的每个维度的长度来重新整形三维数组的一维数组。
最后,将dropout应用到fc层。
def inference(self):
conv1 = conv_relu(inputs=self.img,
filters=32,
k_size=5,
stride=1,
padding='SAME',
scope_name='conv1')
pool1 = maxpool(conv1, 2, 2, 'VALID', 'pool1')
conv2 = conv_relu(inputs=pool1,
filters=64,
k_size=5,
stride=1,
padding='SAME',
scope_name='conv2')
pool2 = maxpool(conv2, 2, 2, 'VALID', 'pool2')
feature_dim = pool2.shape[1] * pool2.shape[2] * pool2.shape[3]
pool2 = tf.reshape(pool2, [-1, feature_dim])
fc = tf.nn.relu(fully_connected(pool2, 1024, 'fc'))
dropout = tf.layers.dropout(fc, self.keep_prob, training=self.training, name='dropout')
self.logits = fully_connected(dropout, self.n_classes, 'logits')
def loss(self):
'''
define loss function
use softmax cross entropy with logits as the loss function
compute mean cross entropy, softmax is applied internally
'''
#
with tf.name_scope('loss'):
entropy = tf.nn.softmax_cross_entropy_with_logits(labels=self.label, logits=self.logits)
self.loss = tf.reduce_mean(entropy, name='loss')
在训练时,需要评估每个epoch的准确率。
def eval(self):
'''
Count the number of right predictions in a batch
'''
with tf.name_scope('predict'):
preds = tf.nn.softmax(self.logits)
correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(self.label, 1))
self.accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32))
conv:Tensor("conv1_1:0", shape=(?, 28, 28, 32), dtype=float32)
pool1:Tensor("pool1/MaxPool:0", shape=(?, 14, 14, 32), dtype=float32)
conv2:Tensor("conv2_1:0", shape=(?, 14, 14, 64), dtype=float32)
pool2:Tensor("pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32)
feature_dim:3136
pool2:Tensor("Reshape:0", shape=(?, 3136), dtype=float32)
fc:Tensor("fc/add:0", shape=(?, 1024), dtype=float32)
dropout:Tensor("relu_dropout/mul:0", shape=(?, 1024), dtype=float32)
self.logits:Tensor("logits/add:0", shape=(?, 10), dtype=float32)
...
...
...
Loss at step 19: 15894.556640625
Loss at step 39: 8952.953125
Loss at step 59: 6065.05322265625
Loss at step 79: 2913.25048828125
Loss at step 99: 2803.952392578125
Loss at step 119: 1727.0462646484375
Loss at step 139: 2886.213134765625
Loss at step 159: 2611.1953125
Loss at step 179: 1743.4693603515625
Loss at step 199: 898.48046875
Loss at step 219: 2171.2890625
Loss at step 239: 475.59246826171875
Loss at step 259: 1289.218017578125
Loss at step 279: 933.6298828125
Loss at step 299: 614.7198486328125
Loss at step 319: 1771.800048828125
Loss at step 339: 1211.3431396484375
Loss at step 359: 1274.873291015625
Loss at step 379: 820.397705078125
Loss at step 399: 633.9185791015625
Loss at step 419: 830.4837646484375
Average loss at epoch 0: 3882.1572788682097
...
...
...
Average loss at epoch 29: 3.834926734323245
Took: 13.498510360717773 seconds
Accuracy at epoch 29: 0.9825
Took: 0.7468070983886719 seconds