# 全连接神经网络(下)

0.说在前面1.Batch Normalization1.1 什么是BN?1.2 前向传播1.3 反向传播2.Dropout2.1 什么是Dropout？2.2 前向传播2.3 反向传播3.任意隐藏层数的全连接网络4.训练模型5.作者的话

## 0.说在前面

ok，我们继续来上次cs231n的assignment2的全连接神经网络第二篇。这一篇则重点研究构建任意层数的全连接网络！下面我们一起来实战吧！

## 1.Batch Normalization

### 1.1 什么是BN?

Understanding the backward pass through Batch Normalization Layer

### 1.2 前向传播

```running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_var = momentum * running_var + (1 - momentum) * sample_var
```

```输入:
- x: Data of shape (N, D)
- gamma: Scale parameter of shape (D,)
- beta: Shift paremeter of shape (D,)
- bn_param: Dictionary with the following keys:
- mode: 'train' or 'test'; required
- eps: Constant for numeric stability
- momentum: Constant for running mean / variance.
- running_mean: Array of shape (D,) giving running mean of features
- running_var Array of shape (D,) giving running variance of features

- out: of shape (N, D)
- cache: A tuple of values needed in the backward pass
```

```def batchnorm_forward(x, gamma, beta, bn_param):
mode = bn_param['mode']
eps = bn_param.get('eps', 1e-5)
momentum = bn_param.get('momentum', 0.9)

N, D = x.shape
running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype))
running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dtype))

out, cache = None, None
if mode == 'train':
# mini-batch mean miu_B (1,D)
sample_mean = np.mean(x,axis=0,keepdims=True)
# miin-batch variance sigema_square (1,D)
sample_var = np.var(x,axis=0,keepdims=True)
# normalize (N,D)
x_normalize = (x-sample_mean)/np.sqrt(sample_var+eps)
# scale and shift
out = gamma*x_normalize+beta
cache=(x_normalize,gamma,beta,sample_mean,sample_var,x,eps)
# update
running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_var = momentum * running_var + (1 - momentum) * sample_var
elif mode == 'test':
x_normalize = (x-running_mean)/np.sqrt(running_var+eps)
out = gamma*x_normalize+beta
else:
raise ValueError('Invalid forward batchnorm mode "%s"' % mode)
# Store the updated running means back into bn_param
bn_param['running_mean'] = running_mean
bn_param['running_var'] = running_var

return out, cache
```

### 1.3 反向传播

```输入:
- dout: Upstream derivatives, of shape (N, D)
- cache: Variable of intermediates from batchnorm_forward.

- dx: Gradient with respect to inputs x, of shape (N, D)
- dgamma: Gradient with respect to scale parameter gamma, of shape (D,)
- dbeta: Gradient with respect to shift parameter beta, of shape (D,)
```

```def batchnorm_backward(dout, cache):
dx, dgamma, dbeta = None, None, None
x_normalized, gamma, beta, sample_mean, sample_var, x, eps = cache
N, D = x.shape
dx_normalized = dout * gamma       # [N,D]
x_mu = x - sample_mean             # [N,D]
sample_std_inv = 1.0 / np.sqrt(sample_var + eps)    # [1,D]
dsample_var = -0.5 * np.sum(dx_normalized * x_mu, axis=0, keepdims=True) * sample_std_inv**3
dsample_mean = -1.0 * np.sum(dx_normalized * sample_std_inv, axis=0, keepdims=True) - 2.0 * dsample_var * np.mean(x_mu, axis=0, keepdims=True)
dx1 = dx_normalized * sample_std_inv
dx2 = 2.0/N * dsample_var * x_mu
dx = dx1 + dx2 + 1.0/N * dsample_mean
dgamma = np.sum(dout * x_normalized, axis=0, keepdims=True)
dbeta = np.sum(dout, axis=0, keepdims=True)
return dx, dgamma, dbeta
```

## 2.Dropout

### 2.1 什么是Dropout？

Dropout可以理解为遗抑制过拟合的一种正规化手段！在训练过程中，对每个神经元，以概率p保持它的激活状态。下面给出dropout的示意图：

### 2.2 前向传播

cs231n直通点

```输入:
- x: Input data, of any shape
- dropout_param: A dictionary with the following keys:
- p: Dropout parameter. We keep each neuron output with probability p.
- mode: 'test' or 'train'. If the mode is train, then perform dropout;
if the mode is test, then just return the input.
- seed: Seed for the random number generator. Passing seed makes this
function deterministic, which is needed for gradient checking but not
in real networks.

输出:
- out: Array of the same shape as x.
- cache: tuple (dropout_param, mask). In training mode, mask is the dropout
mask that was used to multiply the input; in test mode, mask is None.
```

```def dropout_forward(x, dropout_param):
p, mode = dropout_param['p'], dropout_param['mode']
if 'seed' in dropout_param:
np.random.seed(dropout_param['seed'])
out = None
if mode == 'train':
mask = (np.random.rand(*x.shape) < p)   #以某一概率随机失活
elif mode == 'test':
out=x
out = out.astype(x.dtype, copy=False)
return out, cache
```

### 2.3 反向传播

```输入:
- dout: Upstream derivatives, of any shape
- cache: (dropout_param, mask) from dropout_forward.

- dx
```

```def dropout_backward(dout, cache):
mode = dropout_param['mode']

dx = None
if mode == 'train':
elif mode == 'test':
dx = dout
return dx
```

## 3.任意隐藏层数的全连接网络

```{affine - [batch/layer norm] - relu - [dropout]} x (L - 1) - affine - softmax
```

```  初始化一个新的全连接网络.
输入:
- hidden_dims: A list of integers giving the size of each hidden layer.
- input_dim: An integer giving the size of the input.
- num_classes: An integer giving the number of classes to classify.
- dropout: Scalar between 0 and 1 giving dropout strength. If dropout=1 then
the network should not use dropout at all.
- normalization: What type of normalization the network should use. Valid values
are "batchnorm", "layernorm", or None for no normalization (the default).
- reg: Scalar giving L2 regularization strength.
- weight_scale: Scalar giving the standard deviation for random
initialization of the weights.
- dtype: A numpy datatype object; all computations will be performed using
this datatype. float32 is faster but less accurate, so you should use
- seed: If not None, then pass this random seed to the dropout layers. This
will make the dropout layers deteriminstic so we can gradient check the
model.
```

```class FullyConnectedNet(object):
def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10,
dropout=1, normalization=None, reg=0.0,
weight_scale=1e-2, dtype=np.float32, seed=None):
self.normalization = normalization
self.use_dropout = dropout != 1
self.reg = reg
self.num_layers = 1 + len(hidden_dims)
self.dtype = dtype
self.params = {}
############################################################################
num_neurons = [input_dim] + hidden_dims + [num_classes]
# 看一开始的时候注释就说了L-1次，所以这里要前去1
for i in range(len(num_neurons) - 1):
self.params['W' + str(i + 1)] = np.random.randn(num_neurons[i], num_neurons[i+1]) * weight_scale
self.params['b' + str(i + 1)] = np.zeros(num_neurons[i+1])
# 这里处理的总循环式L-1，i最大为L-2，而batchnormalization只在层与层中间，也就是比如三个结点就只有两个间隔，所以这里是到L-2
if self.normalization=='batchnorm' and i < len(num_neurons) - 2:
self.params['beta' + str(i + 1)] = np.zeros([num_neurons[i+1]])
self.params['gamma' + str(i + 1)] = np.ones([num_neurons[i+1]])
############################################################################
self.dropout_param = {}
if self.use_dropout:
self.dropout_param = {'mode': 'train', 'p': dropout}
if seed is not None:
self.dropout_param['seed'] = seed
self.bn_params = []
if self.normalization=='batchnorm':
self.bn_params = [{'mode': 'train'} for i in range(self.num_layers - 1)]
if self.normalization=='layernorm':
self.bn_params = [{} for i in range(self.num_layers - 1)]

# Cast all parameters to the correct datatype
for k, v in self.params.items():
self.params[k] = v.astype(dtype)
```

```输入:
- X: Array of input data of shape (N, d_1, ..., d_k)
- y: Array of labels, of shape (N,). y[i] gives the label for X[i].

If y is None, then run a test-time forward pass of the model and return:
- scores: Array of shape (N, C) giving classification scores, where scores[i, c] is the classification score for X[i] and class c.
```

```{affine - [batch/layer norm] - relu - [dropout]} x (L - 1) - affine - softmax
```

```def loss(self, X, y=None):
X = X.astype(self.dtype)
mode = 'test' if y is None else 'train'
if self.use_dropout:
self.dropout_param['mode'] = mode
if self.normalization=='batchnorm':
for bn_param in self.bn_params:
bn_param['mode'] = mode
scores = None
cache = {}
scores = X
############################################################################
# 前向传播
# {affine - [batch/layer norm] - relu - [dropout]} x (L - 1) - affine - softmax
for i in range(1, self.num_layers + 1):
scores, cache['fc'+str(i)] = affine_forward(scores, self.params['W' + str(i)], self.params['b' + str(i)])
# Do not add relu, batchnorm, dropout after the last layer
if i < self.num_layers:
if self.normalization == "batchnorm":
D = scores.shape[1]
# self.bn_params[i-1] since the provided code above initilizes bn_params for layers from index 0, here we index layer from 1.
scores, cache['bn'+str(i)] = batchnorm_forward(scores, self.params['gamma'+str(i)], self.params['beta'+str(i)], self.bn_params[i-1])
scores, cache['relu'+str(i)] = relu_forward(scores)
if self.use_dropout:
scores, cache['dropout'+str(i)] = dropout_forward(scores, self.dropout_param)
############################################################################

# If test mode return early
if mode == 'test':
return scores

############################################################################
# 计算loss
loss += 0.5 * self.reg * sum([np.sum(self.params['W' + str(i)]**2) for i in range(1, self.num_layers + 1)])
############################################################################

############################################################################
# 反向传播
for i in range(self.num_layers, 0, -1):
if i < self.num_layers: # No ReLU, dropout, Batchnorm for the last layer
if self.use_dropout:
if self.normalization == "batchnorm":
grads['W' + str(i)] += self.reg * self.params['W' + str(i)]
############################################################################
```

## 4.训练模型

```hidden_dims = [100] * 4
range_weight_scale = [1e-2, 2e-2, 5e-3]
range_lr = [1e-5, 5e-4, 1e-5]

best_val_acc = -1
best_weight_scale = 0
best_lr = 0

print("Training...")

for weight_scale in range_weight_scale:
for lr in range_lr:
model = FullyConnectedNet(hidden_dims=hidden_dims, reg=0.0,
weight_scale=weight_scale)
optim_config={'learning_rate': lr},
batch_size=100, num_epochs=5,
verbose=False)
solver.train()
val_acc = solver.best_val_acc
print('Weight_scale: %f, lr: %f, val_acc: %f' % (weight_scale, lr, val_acc))

if val_acc > best_val_acc:
best_val_acc = val_acc
best_weight_scale = weight_scale
best_lr = lr
best_model = model
print("Best val_acc: %f" % best_val_acc)
print("Best weight_scale: %f" % best_weight_scale)
print("Best lr: %f" % best_lr)
```

```Validation set accuracy:  0.528
Test set accuracy:  0.527
```

0 条评论

• ### 1.8亿条海量Txt数据存储MySQL实践

最近出去旅游了，嗨皮了嗨皮，明天上班，开始做作业，今日将1.8亿数据存储的方式进行总结，欢迎大家拍砖！

• ### 如醉如痴之最小堆

一道简单的题，可以让你如醉如痴，更是因为这一道题，你才会学会很多，不要小看简单，简单中蕴含深意。

• ### C++像Go一样的并发与闭包

并发性是程序的一种属性，其中两个或多个任务可以同时进行。并行性是一个运行时属性，其中两个或多个任务同时执行。通过并发性，为程序定义一个适当的结构。并发可以使用并...

• ### 带有偏见和议论标签的公平性评估（CS AI）

风险评估工具在全国范围内广泛用于在刑事司法系统内为决策提供依据。最近，人们对这种工具是否可能遭受种族偏见的问题投入了大量关注。在这种类型的评估中，一个基本问题是...

• ### Leetcode 290. Word Pattern

Given a pattern and a string str, find if str follows the same pattern. Here ...

• ### Leetcode 290. Word Pattern

Given a pattern and a string str, find if str follows the same pattern. Here ...

• ### 原创译文 | 区块链不仅仅是技术，而是新的经济体系

转载声明 本文为灯塔大数据原创内容，欢迎个人转载至朋友圈，其他机构转载请在文章开头标注：“转自：灯塔大数据；微信：DTbigdata” 导读：上一期了解了关于将...

• ### 【论文推荐】最新九篇目标检测相关论文—常识性知识转移、尺度不敏感、多尺度位置感知、渐进式域适应、时间感知特征图、人机合作

【导读】专知内容组整理了最近七篇目标检测（Object Detection）相关文章，为大家进行介绍，欢迎查看! 1.Single-Shot Object De...

• ### 人工智能达特茅斯夏季研究项目提案(1955年8月31日)中英对照版

科学Sciences导读：人工智能达特茅斯夏季研究项目提案(1955年8月31日)中英对照版。全文分为六大部分：一、提案说明，二、C.E.香农(C.E. Sha...

• ### 研究用于社交媒体中仇恨语音检测的深度学习方法（CS CL）

互联网的迅猛发展有助于增强个人的表达能力，但滥用表达自由的行为也导致各种网络犯罪和反社会活动的增加。仇恨言论就是一个这样的问题，需要非常认真地解决，否则，这可能...