# 【直观梳理深度学习关键概念】优化算法、调参基本思路、正则化方式等

【新智元导读】深度学习论文众多，而理解的前提是对基础概念的掌握。本文旨在直观系统地梳理深度学习各领域常见概念与基本思想，使读者对深度学习的重要概念与思想有一直观理解，从而降低后续理解论文及实际应用的难度。

Xavier初始化 从高斯分布或均匀分布中对权值进行采样，使得权值的方差是1/n，其中n是输入神经元的个数。该推导假设激活函数是线性的。

He初始化/MSRA初始化 从高斯分布或均匀分布中对权值进行采样，使得权值的方差是2/n。该推导假设激活函数是ReLU。因为ReLU会将小于0的神经元置零，大致上会使一半的神经元置零，所以为了弥补丢失的这部分信息，方差要乘以2。

L2正则化 L2正则化倾向于使网络的权值接近0。这会使前一层神经元对后一层神经元的影响降低，使网络变得简单，降低网络的有效大小，降低网络的拟合能力。L2正则化实质上是对权值做线性衰减，所以L2正则化也被称为权值衰减（weight decay）。

import random

learning_rate = 10 ** random.uniform(-5, -1) # From 1e-5 to 1e-1

weight_decay = 10 ** random.uniform(-7, -1) # From 1e-7 to 1e-1

momentum = 1 - 10 ** random.uniform(-3, -1) # From 0.9 to 0.999

• [1]Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu. Advances in optimizing recurrent networks. In proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8624--8628, 2013.
• [2]J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(12):281--305, 2012.
• [3]A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. LeCun. The loss surfaces of multilayer networks. In proceedings of the International Conference on Artificial Intelligence and Statistics, pages 192--204, 2015.
• [4]P. Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78--87, 2012.
• [5]J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(12):2121--2159, 2011.
• [6]I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press, 2016.
• [7]K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In proceedings of the IEEE International Conference on Computer Vision, pages 1026--1034, 2015.
• [8]S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In proceedings of the International Conference on Machine Learning, pages 448-456, 2015.
• [9]D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
• [10]F.-F. Li, A. Karpathy, and J. Johnson. CS231n: Convolutional Neural Networks for Visual Recognition. Stanford, 2016.
• [11]F.-F. Li, J. Johnson, and S. Yeung. CS231n: Convolutional Neural Networks for Visual Recognition. Stanford, 2017.
• [12]A. Ng. Machine learning yearning. Draft, 2016.
• [13]A. Ng. CS229: Machine learning. Stanford.
• [14]A. Ng. Neural networks and deep learning. deeplearning.ai.
• [15]A. Ng. Improving deep neural networks: Hyperparameter tuning, regularization and optimization. deeplearning.ai.
• [16]A. Ng. Structuring machine learning projects. deeplearning.ai.
• [17]S Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
• [18]N. Srivastava, G. E Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 15(1):1929--1958, 2014.
• [19]T. Tieleman and G. Hinton. Lecture 6.5-RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26--31, 2012.
• [20]G. Xavier and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In proceedings of the International Conference on Artificial Intelligence and Statistics, pages 249--256, 2010.
• [21]J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In proceedings of Advances in neural information processing systems, pages 3320--3328, 2014.
• [22]Z.-H. Zhou and J. Feng. Deep forest: Towards an alternative to deep neural networks. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 3553--3559, 2017.
• [23]周志华. 机器学习. 清华大学出版社, 2016.

0 条评论

• ### 人工神经网络完爆人类的6大领域：看车都能预测选举

编辑： frances 【新智元导读】来自Venturebeat的盘点，人工智能都在哪些领域超越了人类？欢迎补充。 目前，不同于一次处理多个事物的人脑，机...

• ### 悉尼科大徐亦达教授：1000+页机器学习讲义，32 份主题推介

悉尼科大徐亦达教授近日在GitHub更新了他2019年以来的机器学习新材料，超过1000页的讲义，总共涵盖 32 个主题。

• ### 普林斯顿大学教授：用理论的力量横扫深度学习（77PPT）

【新智元导读】目前深度学习的应用较为广泛，尤其是各种开源库的使用，导致很多从业人员只注重应用的开发，却往往忽略了对理论的深究与理解。普林斯顿大学教授Sanjee...

• ### 深度学习在CV领域已触及天花板？

图像数据的特征设计，即特征描述，在过去一直是计算机视觉（Computer Vision, CV）头痛的问题，而深度学习在计算机视觉领域的兴起使得这一领域不再需要...

• ### 深度学习算法(第34期)----强化学习之梯度策略实现

上期我们一起学习了强化学习中OpenAI中平衡车的相关环境以及搭建神经网络策略的相关知识， 深度学习算法(第33期)----强化学习之神经网络策略学习平衡车 今...

• ### 【NSR特别专题】周志华：弱监督学习简介「全文翻译」

编者按：《国家科学评论》于2018年1月发表“机器学习”特别专题，由周志华教授组织并撰写文章。专题内容还包括对AAAI前主席Tom Dietterich的访谈，...

• ### 机器学习研究和开发所需的组件列表

Here is a list of components that are needed for the successful machine learning...

• ### 学界 | 上海交大卢策吾团队开源PointSIFT刷新点云语义分割记录

论文：PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentatio...

• ### IJPR特刊邀稿| 主题：生产和配送管理中的大数据分析

The International Journal of Production Research (IJPR), published since 1961, i...

• ### 模型加速--Slimmable neural networks

Slimmable neural networks ICLR2019 Code and models will be released