Activation

刘笑江

发布于 2018-05-28 12:09:32

5690

发布于 2018-05-28 12:09:32

文章被收录于专栏：刘笑江的专栏

对比、介绍神经元的激活函数。

Sgn

阶跃函数，sign

优点：理想的计划函数

缺点：不连续，x=0x=0x=0 无导数，不好优化

Sigmoid

测试

\begin{aligned} \sigma(x)&=\frac{e^x}{1+e^x} \in [0, 1] \\ \sigma'(x)&=\sigma(x)(1-\sigma(x)) \end{aligned}

优点：

函数与导数形式一致

缺点

饱和的神经元令梯度弥散（当 $$|x|>5$$ 时，梯度非常小，梯度更新缓慢）
exp 计算复杂度稍高
not zero-center

Tanh

\begin{aligned} \tanh(x) &=2 \cdot \sigma (2x) - 1 \in [-1, 1]\\ \tanh'(x) &= 1 - (\tanh (x))^2 \\ \end{aligned}

优点：

在 x=0x=0x=0 处梯度比 Sigmoid 更大
zero centered

缺点

梯度弥散，当 saturated 时

ReLU

Krizhevsky et al. 2012 [7]

f(x)=max(0, x) \in [0, +\infty]

优点：

接近生物学原理
Sparsity (x≤0x \leq 0x≤0)
不饱和，避免梯度消失 (x>>0x >> 0x>>0)
计算更快
收敛较 sigmoid 更快

缺点

not zero center
当 $$x<0$$ 时，有梯度弥散问题

Leaky ReLU

[Mass et al., 2013], [He et al., 2015]

f(x)=\max(0.01x, x)

优点

不饱和，可以避免 $$x<0$$ 的梯度消失
计算快
比 sigmoid / tanh 收敛快（6x）

PReLU

f(x)=\max(x,ax), a \leq 1

Parametric ReLU [He et al., 2015] 提出，ImageNet 2014 超越人类的准确率。

ELU

[Exponetial Linear Units, Clever et al., 2015]

f(x)=\begin{cases} x &\text{if } x \gt 0 \\ \alpha (\exp (x) - 1) &\text{otherwise} \end{cases}

Maxout

由 Ian J. Goodfellow 等人在 ICML 2013 提出 [4]

\max (w_1^T x + b_1, w_2^T x + b_2)

优点

Generalize ReLU and Leaky ReLU
Linear Regime! 不饱和，不会死

缺点

多一倍参数

Noisy ReLU

f(x)=max(0, x+Y),Y \sim \mathcal{N} (\mu, \sigma^2)

Noisy ReLUs have been used with some success in restricted Boltzmann machines for computer vision tasks. [3]

TLDR

CSS231 Lecture 5 的实践建议

用 ReLU，注意学习率
可以试试 Leaky ReLU / Maxout / ELU
可以试试 tanh 但不要期望太高
不要用 sigmoid

Reference

[1] http://ufldl.stanford.edu/wiki/index.php/神经网络

[2] https://en.wikipedia.org/wiki/Rectifier_(neural_networks)

[3] Vinod Nair and Geoffrey Hinton (2010). Rectified linear units improve restricted Boltzmann machines. ICML. PDF

[4]He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification”. PDF

[5] Goodfellow I J, Warde-Farley D, Mirza M, et al. Maxout networks[J]. arXiv preprint arXiv:1302.4389, 2013. PDF

[6] CS231n Winter 2016 Lecture 5 Neural Networks VIDEO PDF

[7] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105. PDF

[8] 请问人工神经网络中的activation function的作用具体是什么？为什么ReLu要好过于tanh和sigmoid function? https://www.zhihu.com/question/29021768

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2017-08-11，如有侵权请联系 cloudcommunity@tencent.com 删除

神经网络

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

神经网络

登录后参与评论

0 条评论

热度

Activation

Activation

Sgn

Sigmoid

Tanh

ReLU

Leaky ReLU

PReLU

ELU

Maxout

Noisy ReLU

TLDR

Reference

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐