专栏首页机器学习、深度学习二值网络训练--A Empirical Study of Binary Neural Networks' Optimisation

二值网络训练--A Empirical Study of Binary Neural Networks' Optimisation

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zhangjunhit/article/details/90409501

A Empirical Study of Binary Neural Networks’ Optimisation ICLR2019 https://github.com/mi-lad/studying-binary-neural-networks

本文得到的几个结论如下:

  1. ADAM for optimising the objective, (2) not using early stopping, (3) splitting the training into two stages, (4) removing gradient and weight clipping in the first stage and (5) reducing the averaging rate in Batch Normalisation layers in the second stage

在二值网络训练的时候有的用到了下面两个裁剪: Gradient clipping 梯度裁剪 梯度超过一定范围就丢弃 Weight clipping 权重裁剪 让权重值保持在一定范围

forward path (and at the end of the training):

STE with gradient clipping provides an estimate for the gradient of this operation:

上图(a)中的二值卷积核实怎么得到的?二值卷积核 是通过 对 full-precision proxy 进行 二值化(sign函数)得到,对应右图前向。那么这个 full-precision proxy 又是怎么来的了? 通过 STE estimator 学习得到的,对应右图反向

3.1 Impact of Optimiser 优化器的影响

A possible hypothesis is that early stages of training binary models require more averaging for the optimiser to proceed in presence of binarisaton operation. On the other hand, in the late stages of the training, we rely on noisier sources to increase exploration power of the optimiser.

总体上来说 ADAM 更有优势

3.2 Impact of gradient and weight clipping 梯度裁剪和权重裁剪对于二值网络的精度影响不是很大,对于训练网络收敛速度有一定影响

the well-known observation that training a binary model is often notably slower than its non-binary counter-part

The slow down is mainly caused by the commonly applied gradient and weight clipping, as they keep parameters within the {-1,1} range at all times during training

weight and gradient clipping help achieve better accuracy

We tested this hypothesis by training a binary model in two stages: (1) using vanilla STE in the first stage with higher learning rates and (2) turning clippings back on when the accuracy stops improving by reducing learning rate.

11

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 语义分割--Global Deconvolutional Networks for Semantic Segmentation

    语义分割 Global Deconvolutional Networks for Semantic Segmentation BMVC 2016 ...

    用户1148525
  • 视频物体分割--One-Shot Video Object Segmentation

    One-Shot Video Object Segmentation CVPR2017 http://www.vision.ee.ethz.ch/~cvl...

    用户1148525
  • 多人部件解析

    Towards Real World Human Parsing: Multiple-Human Parsing in the Wild https://a...

    用户1148525
  • 自主人工思维系统的视角与伦理(cs AI)

    自主式人工思维系统的可行性需要将人类获取信息和发展思维的方式与当前自主式信息系统的能力进行比较。我们的模型使用了四个层次:信息系统层次、认知层次、语言层次和数字...

    RockNPeng
  • 【SAP FICO系列】SAP ABAP的替代和校验

    I. Creating, activating and transporting validations and substitutions 1. Whic...

    matinal
  • 语义分割--Global Deconvolutional Networks for Semantic Segmentation

    语义分割 Global Deconvolutional Networks for Semantic Segmentation BMVC 2016 ...

    用户1148525
  • 云存储上文件共享系统的缺陷

    云存储提供了一种更为简单的方式来私下和公开地共享文件。一个好的云存储提供商(SCP)不仅通过访问速度或可共享给他人的文件大小来衡量,而且还通过文件共享本身的安全...

    用户7724216
  • 【CodeForces 567E】President and Roads(最短路)

    Berland has n cities, the capital is located in city s, and the historic home to...

    饶文津
  • 机器人体验营笔记(四)实践

    版权声明:署名,允许他人基于本文进行创作,且必须基于与原先许可协议相同的许可协议分发本文 (Creative Commons)

    zhangrelay
  • 仅用四行代码实现RNN文本生成模型

    文本生成(generating text)对机器学习和NLP初学者来说似乎很有趣的项目之一,但也是一个非常困难的项目。值得庆幸的是,网络上有各种各样的优秀资源,...

    用户3578099

扫码关注云+社区

领取腾讯云代金券