注: 该篇博文是我阅读《How the backpropagation algorithm works》一文的笔记,该博文详细介绍了反向传播算法,并给出了反向传播算法四个基本公式中的前两个证明,我顺着作者的思路证明了后面两个,并记录了证明过程,希望能帮助到需要了解反向传播算法数学原理的童鞋。
反向传播过程中的四个基本公式:
\delta^L = \nabla_aC\odot \sigma'(z^L) \tag{BP1}
\delta^l = ((w^{l+1})^\mathrm{T}\delta^{l+1})\odot\sigma'(z^l) \tag{BP2}
\frac{\partial C}{\partial b^l_j} = \delta^l_j \tag{BP3}
\frac{\partial C}{\partial w^l_{jk}} = a^{l-1}_k\delta^l_j \tag{BP4}
下面我们来进行公式的推导
设最后一层L的第j个神经元的误差是
\delta^L_j = \frac{\partial C}{\partial z^L_j} \tag{1}
通过链式法则,我们可以得到
\delta^L_j = \frac{\partial C}{\partial a^L_j}\frac{\partial a^L_j}{\partial z^L_j} \tag{2}
将a^l_j=\sigma(z^L_j)带入可得
\delta^L_j = \frac{\partial C}{\partial a^L_j}\sigma'(z^L_j) \tag{3}
公式BP1即是上式的矩阵形式
对于l层的第j个神经元,我们使用链式法则有:
\delta^l_j = \frac{\partial C}{\partial z^l_j} = \sum\limits_k\frac{\partial C}{\partial z^{l+1}_k}\frac{\partial z^{l+1}_k}{\partial z^l_j} = \sum\limits_k\frac{\partial z^{l+1}_k}{\partial z^l_j}\delta^{l+1}_k \tag{4}
此外,我们有
z^{l+1}_k = \sum\limits_jw^{l+1}_{kj}a^l_j + b^{l+1}_k = \sum\limits_jw^{l+1}_{kj}\sigma(z^l_j) + b^{l+1}_k \tag{5}
对上式微分,得
\frac{\partial z^{l+1}_k}{\partial z^l_j} = w^{l+1}_{kj}\sigma'(z^l_j) \tag{6}
带入公式4,可得
\delta^l_j = \sum\limits_kw^{l+1}_{kj}\delta^{l+1}_k\sigma'(z^l_j) \tag{7}
对于$l$层的第$j$个神经元,我们使用链式法则有:
\frac{\partial C}{\partial b^L_j} = \frac{\partial C}{\partial z^l_j}\frac{\partial z^l_j}{\partial b^l_j} \tag{8}
由于\frac{\partial z^l_k}{\partial b^l_j}恒等于1,所以有
\frac{\partial C}{\partial b^L_j} = \frac{\partial C}{\partial z^l_j} = \delta^l_j \tag{9}
因为z^{l}_j = \sum\limits_kw^{l}_{jk}a^{l-1}_k + b^{l+1}_j,取导数有
\frac{\partial z^l_j}{\partial w^{l}_{jk}} = a^{l-1}_k \tag{10}
对于l层的第j个神经元,我们使用链式法则有:
\frac{\partial C}{\partial w^l_{jk}} = \frac{\partial C}{\partial z^l_{j}} \frac{\partial z^l_{j}}{\partial w^l_{jk}} \tag{11}
将\delta^l_j = \frac{\partial C}{\partial z^l_j}$和$\frac{\partial z^l_{jk}}{\partial w^l_{jk}} = a^{l-1}_k带入公式11,得
\frac{\partial C}{\partial w^l_{jk}} = \delta^l_ja^{l-1}_k \tag{12}
终于,推导完毕!