专栏首页深度学习与数据挖掘实战Kaggle竞赛:《NIPS 2017 Adversarial Learning Challenges》

Kaggle竞赛:《NIPS 2017 Adversarial Learning Challenges》

『Kaggle精选』

  • 《Getting Started with the NIPS 2017 Adversarial Learning Challenges》链接:https://www.kaggle.com/benhamner/adversarial-learning-challenges-getting-started

『深度学习tips』

《Why does batch normalization help?》

  • From Quora by Derek Chen, works at Notability

Batch normalization potentially helps in two ways: faster learning and higher overall accuracy. The improved method also allows you to use a higher learning rate, potentially providing another boost in speed.

Why does this work? Well, we know that normalization (shifting inputs to zero-mean and unit variance) is often used as a pre-processing step(http://ufldl.stanford.edu/wiki/index.php/Data_Preprocessing#Data_Normalization) to make the data comparable across features. As the data flows through a deep network, the weights and parameters adjust those values, sometimes making the data too big or too small again - a problem the authors refer to as "internal covariate shift". By normalizing the data in each mini-batch, this problem is largely avoided.

Basically, rather than just performing normalization once in the beginning, you're doing it all over place. Of course, this is a drastically simplified view of the matter (since for one thing, I'm completely ignoring the post-processing updates applied to the entire network), but hopefully this gives a good high-level overview.

Update: For a more detailed breakdown of gradient calculations, check out: Understanding the backward pass through Batch Normalization Layer(http://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html)

  • From Quora by Shuaib Ahmed S, PhD, Machine learning

Naturally, neural networks including deep networks require careful tuning of weight initialization and learning parameters. Batch normalization helps relaxing them a little.

Weights problem:

  • Whatever the initialization of weights, be it random or empirically chosen, they are far away from the learned weights. Consider a mini-batch, during initial epochs, there will be many outliers (far away weights from required) in terms of required feature activations.
  • The (deep) neural network by itself is ill-posed, i.e. a small perturbation in the initial layers, leads to a large change in the later layers.

During back propagation, these phenomena causes distraction to gradients, meaning the gradients have to compensate the outliers, before learning the weights to produce required outputs. This leads to the requirement of extra epochs to converge.

Batch normalization regularizes these gradient from distraction to outliers and flow towards the common goal (by normalizing them) within a range of the mini-batch. Resulting in acceleration of the learning process.

Learning rate problem:

Generally, learning rates are kept small, such that only a small portion of gradients corrects the weights, the reason is that the gradients for outlier activations should not affect learned activations. By batch normalization, these outlier activations are reduced and hence higher learning rates can be used to accelerate the learning process.

本文分享自微信公众号 - 深度学习与数据挖掘实战(www_datageekers_com),作者:fishexpert

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2017-07-14

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Creative Thinking

    Up to 100% of the amount of ideas produced, useful good ideas produced by these ...

    fishexpert
  • Graph application with Python, Neo4j, Gephi & Linkurious.js

    I love Python, and to celebrate Packt Python week, I’ve spent some time developi...

    fishexpert
  • 【AI头条&优质资源】深度学习近期和未来有哪些突破?Yann LeCun的Quora解答

    《What are some recent and potentially upcoming breakthroughs in deep learning?》

    fishexpert
  • Selenium Webdriver 在当前浏览器 Context 中 执行 JS 脚本。 Execute JavaScript based code using Selenium Webdriver

    In Selenium Webdriver, locators like XPath, CSS, etc. are used to identify and p...

    一个会写诗的程序员
  • 【NDN心得】Literature Review on Security of Named Data Networking

    版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/gongxifacai_believe/artic...

    魏晓蕾
  • Linux命令行基础

    AT&T公司于20世纪70年代发布了UNIX系统。经过多年的发展,Unix不再是某一个具体操作系统的名称,而是对遵循Unix规范、设计和哲学的一类操作系统的统称...

    雪飞鸿
  • Codefoces 723A The New Year: Meeting Friends

    A. The New Year: Meeting Friends time limit per test:1 second memory limit per t...

    Angel_Kitty
  • 大脑中的数字神经网络:从世界上提取结构的机制到大脑本身的自我构造(CS NEC)

    为了跟踪信息,大脑必须解决信息在哪里以及如何索引新信息的问题。我们提出,前额皮质(PFC)用于检测时间序列中结构的神经机制(基于传入信息的时间顺序)已作为脑网络...

    小童
  • Understanding a Kernel Oops!

    Understanding a kernel panic and doing the forensics to trace the bug is conside...

    jeff xie
  • 通过对抗式自动编码器创造出一种具有风格意识的符号音乐的潜在空间(CS SD)

    我们解决了生成音乐建模中符号音乐数据有效潜空间的学习这一具有挑战性的开放问题。我们的重点是利用对抗性规则作为一种灵活和自然的手段,将与音乐类型和风格相关的上下文...

    用户6853689

扫码关注云+社区

领取腾讯云代金券