BERT,论文:“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”,论文作者:Jacob Devlin, Ming-Wei Chang, Kenton Lee,Kristina Toutanova
OpenAI 的GPT,论文:“Improving Language Understanding by Generative Pre-Training”,论文作者:Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
OpenAI的GPT-2,论文:“Language Models are Unsupervised Multitask Learners”,论文作者:Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei,Ilya Sutskever
谷歌和CMU的Transformer-XL,论文:“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”,论文作者:Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
谷歌和CMU的XLNet,论文:“XLNet: Generalized Autoregressive Pretraining for Language Understanding”,论文作者:Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
Facebook的XLM,论文:“Cross-lingual Language Model Pretraining”,论文作者:Guillaume Lample,Alexis Conneau