Python编程任务 | 斯坦福CS231n-深度学习与计算机视觉课程

Assignment 3

04 Python编程任务(2-layer神经网络)

· Assignment1的神经网络部分,我们需要完成neural_net.py,完成后可以用two_layer_net.ipynb里的代码(部分代码需要自己完成)来调试你的模型,优化超参数,获得最优模型,最后在测试集上测试分类水平。

· 这里用的图像库还是CIFAR-10。

neural_net.py 代码如下:

__coauthor__ = 'Deeplayer'
# 6.14.2016 

#import numpy as np
class TwoLayerNet(object):    
    """    
 A two-layer fully-connected neural network.
 The net has an input dimension of D,
  a hidden layer dimension of H, 
and performs classification over C classes.    
The network has the following architecture:    
input - fully connected layer - ReLU - fully connected layer - softmax
The outputs of the second fully-connected layer are the scores for each class.
"""
 def __init__(self, input_size, hidden_size, output_size, std=1e-4): 
 self.params = {}    
 self.params['W1'] = std * np.random.randn(input_size, hidden_size)   
 self.params['b1'] = np.zeros((1, hidden_size))    
self.params['W2'] = std * np.random.randn(hidden_size, output_size)   
self.params['b2'] = np.zeros((1, output_size))

def loss(self, X, y=None, reg=0.0):
 """    
Compute the loss and gradients for a two layer fully connected neural network.
 """
# Unpack variables from the params dictionary
W1, b1 = self.params['W1'], self.params['b1']
 W2, b2 = self.params['W2'], self.params['b2']
 N, D = X.shape

 # Compute the forward pass
 scores = None
 h1 = ReLU(np.dot(X, W1) + b1) # hidden layer 1  (N,H)
 out = np.dot(h1, W2) + b2  # output layer    (N,C)
scores = out  # (N,C)  
if y is None:   
 return scores

 # Compute the lossloss = None
 # Considering the Numeric Stability
scores_max = np.max(scores, axis=1, keepdims=True)    # (N,1)
# Compute the class probabilities
exp_scores = np.exp(scores - scores_max)              # (N,C)
 probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)    # (N,C)
# cross-entropy loss and L2-regularization
correct_logprobs = -np.log(probs[range(N), y])        # (N,1)
 data_loss = np.sum(correct_logprobs) / N
 reg_loss = 0.5 * reg * np.sum(W1*W1) + 0.5 * reg * np.sum(W2*W2)
loss = data_loss + reg_loss

 # Backward pass: compute gradients
grads = {}
 # Compute the gradient of scores
 dscores = probs          # (N,C)
dscores[range(N), y] -= 1
dscores /= N
# Backprop into W2 and b2
 dW2 = np.dot(h1.T, dscores)          # (H,C)
 db2 = np.sum(dscores, axis=0, keepdims=True)    # (1,C)
# Backprop into hidden layer
dh1 = np.dot(dscores, W2.T)          # (N,H)
 # Backprop into ReLU non-linearity
 dh1[h1 <= 0] = 0
 # Backprop into W1 and b1
 dW1 = np.dot(X.T, dh1)         # (D,H)
 db1 = np.sum(dh1, axis=0, keepdims=True)        # (1,H)
 # Add the regularization gradient contribution
 dW2 += reg * W2
 dW1 += reg * W1
grads['W1'] = dW1
grads['b1'] = db1
 grads['W2'] = dW2
 grads['b2'] = db2

 return loss, grads

 def train(self, X, y, X_val, y_val, learning_rate=1e-3, 
 learning_rate_decay=0.95, reg=1e-5, mu=0.9, num_epochs=10, 
 mu_increase=1.0, batch_size=200, verbose=False):   
        """    
Train this neural network using stochastic gradient descent. 
 Inputs:    
- X: A numpy array of shape (N, D) giving training data.    
 - y: A numpy array f shape (N,) giving training labels; y[i] = c means that         
 X[i] has label c, where 0 <= c < C.    
 - X_val: A numpy array of shape (N_val, D) giving validation data.    
 - y_val: A numpy array of shape (N_val,) giving validation labels.    
 - learning_rate: Scalar giving learning rate for optimization.    
  - learning_rate_decay: Scalar giving factor used to decay the learning rate                           
     after each epoch.    
 - reg: Scalar giving regularization strength.    
- num_iters: Number of steps to take when optimizing.   
 - batch_size: Number of training examples to use per step.    
- verbose: boolean; if true print progress during optimization.  
 """
 num_train = X.shape[0]
 iterations_per_epoch = max(num_train / batch_size, 1)
 # Use SGD to optimize the parameters
v_W2, v_b2 = 0.0, 0.0
 v_W1, v_b1 = 0.0, 0.0
 loss_history = []
  train_acc_history = []
 val_acc_history = []

 for it in xrange(1, num_epochs * iterations_per_epoch + 1):   
 X_batch = None   
   y_batch = None    
 # Sampling with replacement is faster than sampling without replacement.   
 sample_index = np.random.choice(num_train, batch_size, replace=True)   
  X_batch = X[sample_index, :]        # (batch_size,D)    
  y_batch = y[sample_index]           # (1,batch_size)   

  # Compute loss and gradients using the current minibatch 
 loss, grads = self.loss(X_batch, y=y_batch, reg=reg) 
 loss_history.append(loss)    

  # Perform parameter update (with momentum)    
 v_W2 = mu * v_W2 - learning_rate * grads['W2']    
 self.params['W2'] += v_W2   
 v_b2 = mu * v_b2 - learning_rate * grads['b2']    
   self.params['b2'] += v_b2   
  v_W1 = mu * v_W1 - learning_rate * grads['W1']    
 self.params['W1'] += v_W1   
   v_b1 = mu * v_b1 - learning_rate * grads['b1']  
  self.params['b1'] += v_b1    
  """    
   if verbose and it % 100 == 0:        
 print 'iteration %d / %d: loss %f' % (it, num_iters, loss) 
 """   
 # Every epoch, check train and val accuracy and decay learning rate.
 if verbose and it % iterations_per_epoch == 0:    
 # Check accuracy    
 epoch = it / iterations_per_epoch    
  train_acc = (self.predict(X_batch) == y_batch).mean()    
   val_acc = (self.predict(X_val) == y_val).mean()    
  train_acc_history.append(train_acc)    
   val_acc_history.append(val_acc)    
 print 'epoch %d / %d: loss %f, train_acc: %f, val_acc: %f' % 
 (epoch, num_epochs, loss, train_acc, val_acc)    
  # Decay learning rate    
learning_rate *= learning_rate_decay    
 # Increase mu    
 mu *= mu_increase

  return {   
  'loss_history': loss_history,   
 'train_acc_history': train_acc_history,   
 'val_acc_history': val_acc_history,
 }

def predict(self, X):    
 """  
Inputs:    
 - X: A numpy array of shape (N, D) giving N D-dimensional data points to  classify.    
 Returns:    
- y_pred: A numpy array of shape (N,) giving predicted labels for each of 
 the elements of X. For all i, y_pred[i] = c means that X[i] is 
 predicted to have class c, where 0 <= c < C.   
 """    
y_pred = None    
 h1 = ReLU(np.dot(X, self.params['W1']) + self.params['b1'])    
scores = np.dot(h1, self.params['W2']) + self.params['b2']    
 y_pred = np.argmax(scores, axis=1)    

 return y_pred

def ReLU(x):    
 """ReLU non-linearity."""    
 return np.maximum(0, x)

完成neural_net.py后,你需要检查代码编写是否正确(用two_layer_net.ipynb里的代码来check);check完之后,我们就需要优化超参数了。

这里,我给出我的最优模型代码和第一层权重W1的可视化结果,识别率还有上升空间,欢迎小伙伴贴出更高的识别率。

nn_twolayer_best.py 代码如下:

__coauthor__ = 'Deeplayer'
# 6.16.2016 

import numpy as np
import matplotlib.pyplot as plt
from neural_net import TwoLayerNet
from data_utils import load_CIFAR10
from vis_utils import visualize_grid

# Load the data
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):    
    """    
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare    
    it for the two-layer neural net classifier. These are the same steps as    
    we used for the SVM, but condensed to a single function.    
    """    
    # Load the raw CIFAR-10 data    
    cifar10_dir = 'E:/PycharmProjects/ML/CS231n/cifar-10-batches-py'   # make a change
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)    
    # Subsample the data    
    mask = range(num_training, num_training + num_validation)    
    X_val = X_train[mask]     # (1000,32,32,3)    
    y_val = y_train[mask]     # (1000L,)   
    mask = range(num_training)    
    X_train = X_train[mask]   # (49000,32,32,3)    
    y_train = y_train[mask]   # (49000L,)    
    mask = range(num_test)   
    X_test = X_test[mask]    # (1000,32,32,3)    
    y_test = y_test[mask]    # (1000L,)    

    # preprocessing: subtract the mean image    
    mean_image = np.mean(X_train, axis=0)    
    X_train -= mean_image   
    X_val -= mean_image    
    X_test -= mean_image    

    # Reshape data to rows    
    X_train = X_train.reshape(num_training, -1)    # (49000,3072)    
    X_val = X_val.reshape(num_validation, -1)     # (1000,3072)    
    X_test = X_test.reshape(num_test, -1)         # (1000,3072)    

    return X_train, y_train, X_val, y_val, X_test, y_test

# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape

# Look for the best net
best_net = None      # store the best model into this
input_size = 32 * 32 * 3
hidden_size = 100
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

"""
max_count = 100
for count in xrange(1, max_count + 1):    
    reg = 10 ** np.random.uniform(-4, 1)    
    lr = 10 ** np.random.uniform(-5, -3)   
    stats = net.train(X_train, y_train, X_val, y_val, num_epochs=5, 
batch_size=200, mu=0.5, mu_increase=1.0, learning_rate=lr, 
 learning_rate_decay=0.95, reg=reg, verbose=True)  
 
 print 'val_acc: %f, lr: %s, reg: %s, (%d / %d)' % 
(stats['val_acc_history'][-1], format(lr, 'e'), format(reg, 'e'), count, max_count)

# according to the above experiment, reg ~= 0.9,  lr ~= 5e-4
"""

stats = net.train(X_train, y_train, X_val, y_val,  
 num_epochs=40, batch_size=400, mu=0.5,                      
 mu_increase=1.0, learning_rate=5e-4,     
 learning_rate_decay=0.95, reg=0.9, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print 'Validation accuracy: ', val_acc    # about 52.7%

# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.ylim([0, 0.8])
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Classification accuracy')
plt.legend(bbox_to_anchor=(1.0, 0.4))
plt.grid(True)
plt.show()

best_net = net
# Run on the test set
test_acc = (best_net.predict(X_test) == y_test).mean()
print 'Test accuracy: ', test_acc    # about 54.6%

# Visualize the weights of the best network
def show_net_weights(net):    
    W1 = net.params['W1']    
    W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)    
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))    
    plt.gca().axis('off')   
    plt.show()show_net_weights(best_net)

loss.png

W1.png

最后再说两句,神经网络和线性分类器(SVM and Softmax)有什么区别与联系?神经网络可以看成是非线性分类器(不仅仅如此),其实对于分类问题,还有一个更重要的步骤我们没讲,就是特征提取 (feature extraction),好的特征提取,可以使我们的分类水平大大提高。前面的线性分类器做的只是在原始像素(预处理后的)上进行分类,所以效果并不好。而神经网络(全连接)隐藏层的作用可以看成是进行(全局)特征提取,实际上基本没有提取到什么特征(更像是一个模板)。但好在这些特征不需要人工选择,完全是神经网络自己学习到的!所以,对于分类问题的神经网络,可以分成两个部分:特征提取+线性分类器。严格来讲,卷积神经网络才真正做到了这一点。

遗憾的是,随着神经网络层数的加深,损失函数越来越容易陷入局部最优解,并且这个“陷阱”越来越偏离真正的全局最优(因为我们的权重都是随机初始化的)。所以利用有限数据训练的(全连接)深度神经网络(DNN),性能还不如层数较浅的网络;此外,随着层数的增加,“梯度消失”现象也会越发严重。不过这两个问题都已经得到了很大缓解:

1、2006年,Hinton发表的两篇论文Reducing the Dimensionality of Data with Neural Networks、A Fast Learning Algorithm for Deep Belief Nets利用预训练方法缓解了局部最优解问题,具体思想就是:利用无监督的逐层贪婪学习算法,一层一层地预训练神经网络的权重(每一层通过一个稀疏自编码器完成训练),最后再用有标签的数据通过反向传播微调所有权重。

2、我们之前讲过的ReLU、Maxout等激活函数,可以很好地克服“梯度消失”现象,而后来的Batch Normalization更是凶猛。

2012年ImageNet比赛中,CNN以压倒性的优势取得胜利,深度学习的巨大浪潮才正式开始。而自那之后,预训练方法已经被完全抛弃了,大概是因为数据量足够大了。

卷积神经网络(CNNs)的强大是因为它有着非常强大的(局部)特征提取能力,而且这些特征是逐层抽象化的,即下一层的特征是上一层的组合。层数越深,特征组合就越多、越深刻。

原文发布于微信公众号 - 人工智能LeadAI(atleadai)

原文发表时间:2017-09-07

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏技术随笔

[译] 基于R-FCN的物体检测题目:基于R-FCN的物体检测摘要简介方法相关工作实验总结与展望Reference

2986
来自专栏marsggbo

Andrew Ng机器学习课程笔记--week5(上)

Neural Networks: Learning 内容较多,故分成上下两篇文章。 一、内容概要 Cost Function and Backpropagat...

1748
来自专栏刘笑江的专栏

Deep Learning

1673
来自专栏深度学习艺术

深度神经网络反向传播(BP)算法应用技巧

深度神经网络学习算法的核心是误差反向传播(error back-backpropagation)。虽然其本质就是微积分的链式法则,但面对深度神经网络的某一具体层...

2109
来自专栏marsggbo

DeepLearning.ai学习笔记(一)神经网络和深度学习--Week3浅层神经网络

介绍 DeepLearning课程总共五大章节,该系列笔记将按照课程安排进行记录。 另外第一章的前两周的课程在之前的Andrew Ng机器学习课程笔记(博...

2009
来自专栏林欣哲

目标检测

今天的这篇是对吴恩达的深度学习微专业的第四节课卷积神经网络的第三周的目标检测的总结。 普通的卷积神经网络我们用来识别一张图片是什么东西。但是有些时候我们需要知...

3578
来自专栏null的专栏

利用Theano理解深度学习——Convolutional Neural Networks

注:本系列是基于参考文献中的内容,并对其进行整理,注释形成的一系列关于深度学习的基本理论与实践的材料,基本内容与参考文献保持一致,并对这个专题起名为“利用The...

3919
来自专栏机器学习算法与Python学习

支持向量机(SVM)--3

上次说到支持向量机处理线性可分的情况,这次让我们一起学习一下支持向量机处理非线性的情况,通过引进核函数将输入空间映射到高维的希尔伯特空间,进而将线性不可...

3085
来自专栏机器学习算法与Python学习

干货 | 自然语言处理(3)之词频-逆文本词频(TF-IDF)详解

关键字全网搜索最新排名 【机器学习算法】:排名第一 【机器学习】:排名第一 【Python】:排名第三 【算法】:排名第四 前言 在()中讲到在文本挖掘预处理中...

4035
来自专栏和蔼的张星的图像处理专栏

6. RCNN--Fast-RCNN--Faster-RCNN技术演进

分类已经学习过了四大网络(AlexNet,VGG,InceptionNer,ResNet),对于一个分类问题,数据量足够的话,根据分类复杂性搭建不同深度的卷积神...

833

扫描关注云+社区