可以看一下B站的视频学习:
(1)词向量与ELMO模型:https://www.bilibili.com/video/av89296151?p=1
(2)Self-Attention 与 Transformer:https://www.bilibili.com/video/av89296151?p=2 (李宏毅老师:https://www.bilibili.com/video/av56239558?from=search&seid=16574761851422607319)
(3)从Transformer到Bert模型:https://www.bilibili.com/video/av89296151?p=3
(4)ALBert:https://www.bilibili.com/video/av89296151?p=4
学习路线:
可以看一下B站的视频学习:
(1)词向量与ELMO模型:https://www.bilibili.com/video/av89296151?p=1
(2)Self-Attention 与 Transformer:https://www.bilibili.com/video/av89296151?p=2 (李宏毅老师:https://www.bilibili.com/video/av56239558?from=search&seid=16574761851422607319)
(3)从Transformer到Bert模型:https://www.bilibili.com/video/av89296151?p=3
(4)ALBert:https://www.bilibili.com/video/av89296151?p=4
还可以看:
(5)为什么BERT有3个嵌入层,它们都是如何实现的:https://www.cnblogs.com/d0main/p/10447853.html