# 从0到1：实现循环神经网络Vanilla RNN(序列分析)

RNN是深度学习算法的核心构件，为了更好的理解算法，我们从动机、结构，到反向传播和学习策略，逐步分析，然后不借助深度学习框架，实现RNN模型，再应用于时序数据的分析预测，验证这个模型。

RNN模型结构

[周,六,早,上,难,得,好,天,气,\,, 正,好,可,以,...]

RNN模型的输入X是 D 维向量，W 是权参矩阵，任意时刻 t ，隐藏节点的原始输出，是该时刻输入Xt，和 t-1 时刻隐层节点输出的加权和：

RNN的算法表达式中，也可以把WhxWhh拼接为一个权参矩阵W，同时拼接上一层输入和前一时刻隐藏节点输出[xt,ht-1]，再和权参W做仿射变换：

All non-trivial abstractions, to some degree, are leaky. --Joel Spolsky，CEO of Stack Overflow，2002 所有的复杂抽象，或多或少，都有泄露。

Joel Spolsky用好莱坞速运的例子，形象的类比了TCP协议对IP协议的抽象泄露。

Imagine that we had a way of sending actors from Broadway to Hollywood that involved putting them in cars and driving them across the country. Some of these cars crashed, killing the poor actors. Sometimes the actors got drunk on the way and shaved their heads or got nasal tattoos, thus becoming too ugly to work in Hollywood, and frequently the actors arrived in a different order than they had set out, because they all took different routes. Now imagine a new service called Hollywood Express, which delivered actors to Hollywood, guaranteeing that they would (a) arrive (b) in order (c) in perfect condition. The magic part is that Hollywood Express doesn’t have any method of delivering the actors, other than the unreliable method of putting them in cars and driving them across the country. Hollywood Express works by checking that each actor arrives in perfect condition, and, if he doesn’t, calling up the home office and requesting that the actor’s identical twin be sent instead. If the actors arrive in the wrong order Hollywood Express rearranges them. If a large UFO on its way to Area 51 crashes on the highway in Nevada, rendering it impassable, all the actors that went that way are rerouted via Arizona and Hollywood Express doesn’t even tell the movie directors in California what happened. To them, it just looks like the actors are arriving a little bit more slowly than usual, and they never even hear about the UFO crash.

TCP试图对不可靠的底层提供完整的抽象，却不能真的保护你免受底层问题的影响。

RNN的反向传播，与CNN相比，多了一条按照时间反向传导的通道，也被称为BPTT（back-propagation through time）算法，下面分解来看。

m反向传递到隐藏节点原始输出这个环节，其误差 E，来自层级和时间两个方向，可由复合求导法则，推得这个误差：

RNN的梯度传递问题

RNN模型在同一层上共享权值参数，依时间步循环展开迭代计算；观察反向传播，可以看到沿时间步反向传递的梯度，在所有的隐藏节点上，总是和同一个矩阵做乘积运算(multiply)。

```def rnn_step_forward(self, x, prev_h, Wx, Wh, b):
"""
Inputs:
- x: Input data for this timestep, of shape (N, D).
- prev_h: Hidden state from previous timestep, of shape (N, H)
- Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
- Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
- b: Biases of shape (H,)
Returns a tuple of:
- next_h: Next hidden state, of shape (N, H)
- cache: Tuple of values needed for the backward pass.
"""
next_h, cache = None, None
z = np.matmul(x,Wx)+np.matmul(prev_h,Wh) +b
next_h = np.tanh(z)
dtanh = 1. - next_h * next_h
cache=(x, prev_h, Wx, Wh, dtanh)
return next_h, cache```

```def rnn_step_backward(self, dnext_h, cache):
"""
Inputs:
- dnext_h: Gradient of loss with respect to next hidden state, of shape (N, H)
- cache: Cache object from the forward pass
Returns a tuple of:
- dx: Gradients of input data, of shape (N, D)
- dprev_h: Gradients of previous hidden state, of shape (N, H)
- dWx: Gradients of input-to-hidden weights, of shape (D, H)
- dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
- db: Gradients of bias vector, of shape (H,)
"""
dx, dprev_h, dWx, dWh, db = None, None, None, None, None
x, prev_h, Wx, Wh, dtanh = cache
dz = dnext_h * dtanh
dx = np.matmul(dz,Wx.T)
dprev_h = np.matmul(dz,Wh.T)
dWx = np.matmul(x.T,dz)
dWh = np.matmul(prev_h.T,dz)
db = np.sum(dz,axis=0)
return dx, dprev_h, dWx, dWh, db```

https://zhuanlan.zhihu.com/p/49205794

RNN循环神经网络，引入记忆机制，得以抽取序列特征，且能通过增加隐藏节点和隐藏层级，提高模型的表达能力，开辟了深度学习算法和适用场景的新方向。然而，时序数据中，远近不同样本，对输出决策的影响权重，往往不同；此外，随着RNN模型深度增加，梯度传递问题也使模型不易训练。

（完）

[1] The Stanford CS class CS231n

[2] The Unreasonable Effectiveness of Recurrent Neural Networks

[3] Yes you should understand backprop

[4 ] The Law of Leaky Abstractions

[5] Bengio, Yoshua, Simard, Patrice, and Frasconi, Paolo.Learning long-term dependencies with gradient descent is difficult. Neural Networks, IEEE Transactions on, 5(2):157–166, 1994.

[6] Razvan Pascanu, Tomas Mikolov, Yoshua Bengio. Understanding the exploding gradient problem. CoRR abs/1211.5063 ,2012.

[7] J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities", Proceedings of the National Academy of Sciences of the USA, vol. 79 no. 8 pp. 2554–2558, April 1982.

0 条评论

• ### 从语言模型看Bert的善变与GPT的坚守

http://www.tensorinfinity.com/paper_160.html

• ### 网络表征学习综述

当前机器学习在许多应用场景中已经取得了很好的效果，例如人脸识别与检测、异常检测、语音识别等等，而目前应用最多最广泛的机器学习算法就是卷积神经网络模型。但是大多应...

• ### 【免费线上实践】动手训练模型系列：梯度消失

本模型实现对512*512图像的像素二分类问题；红色或蓝色的样本点（每个样本包含坐标(x,y)值）坐落在对应颜色的区域内则样本正确分类，反之分类错误。 loss...

• ### 关于破坏环境的概率趋势：视频游戏玩家的知觉和行为研究（CS CY）

当前，游戏是世界上最受欢迎的娱乐形式。各种研究表明，游戏如何影响玩家的观念和行为，从而为娱乐以外的目的带来了机遇。这项研究使用实时生活模拟游戏《动物之森:新地平...

• ### 创建Linux服务，轻松管理（自启动，恢复）进程

有这样一个场景，在一台服务器上，我们想要启动一个简单的网络文件服务器，用来提供给内网的用户下载。

• ### NLP常用数据集

原文地址: https://machinelearningmastery.com/datasets-natural-language-processing/ 针...

• ### 干货 | 10分钟带你全面掌握branch and bound（分支定界）算法-概念篇

之前一直做启发式算法，最近突然对精确算法感兴趣了。但是这玩意儿说实话是真的难，刚好boss又叫我学学column generation求解VRP相关的内容。

• ### 四巨头23种设计模式的意图

了解设计模式的意图，是在代码重构中浮现并识别设计模式的关键。 本文将四巨头在《设计模式》一书的23种设计模式的意图放在一个页面里，方便查阅。四巨头把这23种设...

• ### LeetCode 976 Largest Perimeter Triangle

Given an array A of positive lengths, return the largest perimeter of a triangle...

• ### 矩阵值函数的有理逼近算法（CS NA）

本文讨论了矩阵值函数有理逼近的几种算法，包括插值AAA法、基于近似最小二乘拟合的RKFIT法、向量拟合的RKFIT法和基于块Loewner矩阵低秩逼近的RKFI...