二值网络--Training Binary Weight Networks via Semi-Binary Decomposition

用户1148525

发布于 2019-05-28 23:18:23

5420

发布于 2019-05-28 23:18:23

文章被收录于专栏：机器学习、深度学习

Training Binary Weight Networks via Semi-Binary Decomposition

ECCV2018

CNN模型的压缩或加速总体上分为三类： pruning-based methods, low-rank decomposition based methods, and quantization-based methods

本文属于 low-rank decomposition based methods，基本的思路就是对于实数值权重 W 我们对其进行矩阵分解为 UDV ，其中 U 和 V 是二值矩阵，D 是对角矩形。在二值网络最初的阶段是使用一个二值矩阵 B 来表示 W ，后来使用一个 α *B来近似W，α 是一个 scale factor ，接着使用了多个 α *B 来表示 W ，现在我们用 UDV 来表示 W 。基本的思路就是二值网络的 representation capacity 与实数网络相比较差很多，我们需要不断的提升二值网络的 representation capacity 也就是取值范围

3 Our method

3.1 Preliminary

对于卷积层权重 W，一开始是使用 sign function 直接二值化

后来使用一个 α *B来近似W

问题： parameters in the same convolutional kernels has the same magnitude α，表征能力还是差了，取值范围还是小

3.2 Semi-Binary Decomposition

Eq. (3) is hard to solve due to the binary constraints, here we learn the components in a greedy way

上面这个方程很难求解，这里采用贪婪方式学习

To solve Eq. (4), we propose an alternating optimization method i.e. iteratively update one decomposition factor with other factors fixed.

3.3 Featuremap-Oriented Semi-Binary Factors

直接对网络所有层的 W 使用 semi-binary decomposition 进行分解有两个弊端：1）在前向计算时，权重乘以输入特征图，二值量化误差会被输入特征图放大。2）直接对整个网络使用 semi-binary decomposition 会造成精度的较大下降，因为量化误差通过多层的累积。

如何解决这个问题了？ learn the semi-binary components via minimizing the output featuremap’s quantization loss

3.4 Fine-tuning

经过矩阵分解后，我们对于一个 convolutional layer with T covolutional kernels of size c ∗ d ∗ d， replace the original layer with three layers: a convolutional layer conv v, one scale layer scale d, and a convolutional layer conv u. Layer conv v has K covolutional

kernels of size c ∗ d ∗ d, layer conv u has T covolutional kernels of size K ∗ 1 ∗ 1 and layer scale d has only K parameters

3.5 Complexity Analysis