最近,关于Attention是否可以解释模型的输出受到越来越多的质疑,参见"Attention is not not Explanation"[1] 和 "Attention is not not Explanation"[2]。今天,我们介绍一种更加合理并且有效的解释模型输出的方法:Integrated Gradients,出自Google 2017年的一篇论文"Axiomatic Attribution for Deep Networks"[3]。
\begin{array}{r}
\text { IntegratedGrads }_{i}^{a p p r o x}(x)::= \left(x_{i}-x_{i}^{\prime}\right) \times \Sigma_{k=1}^{m} \frac{\left.\partial F\left(x^{\prime}+\frac{k}{m} \times\left(x-x^{\prime}\right)\right)\right)}{\partial x_{i}} \times \frac{1}{m}
\end{array}