“Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[J]. Computer Science, 2014.
分别表示前三个单词对第一个词语的翻译具有的影响力。
用以表示在翻译第二个单词时,要分别放多少注意力在前三个单词上。并且前一步翻译的输出也会作为下一步的输入。
a^{
a^{
“[1] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[J]. Computer Science, 2014. [2] Xu K, Ba J, Kiros R, et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention[J]. Computer Science, 2015:2048-2057.
a^{
y^{}
a^{
s^{}
a^{
缺点 这个算法的缺点是其算法的复杂度是
,如果有
个输入单词和
个输出单词,于是注意力参数的总数就是
[1]
吴恩达老师课程原地址: https://mooc.study.163.com/smartSpec/detail/1001319001.htm