选择PLM(PLM,pretraining language model预训练语言模型):使用PLM可以显着改善所有流行的文本分类任务,并且自动编码的PLM(例如BERT或RoBERTa)通常比自回归PLM(例如OpenAI GPT)更好。Hugging face拥有为各种任务开发的丰富的PLM仓库。
WOS。Web Of Science(WOS)数据集[22]是可从以下网站获取的已发表论文的数据和元数据的集合Web of Science,是世界上最受信任的发行商独立的全球引文数据库。WOS已发布三个版本:WOS-46985,WOS-11967和WOS-5736。WOS-46985是完整的数据集。WOS-11967和WOS-5736是WOS-46985的两个子集。
少量学习和零学习Few-Shot and Zero-Shot Learning。大多数DL模型是受监督的模型,需要大量的领域标签。实际上,为每个新领域收集此类标签非常昂贵。与从头训练模型相比,将PLM(例如BERT和OpenGPT)微调到特定任务所需的领域标签要少得多,这为开发基于PLM的零学习或少量学习方法提供了机会。
[1] I.Goodfellow, Y.Bengio, and A.Courville,Deep learning. MIT press, 2016.
[2]A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for naturallanguage understanding,”arXiv preprint arXiv:1804.07461, 2018.
[3]Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, “Domain-specific language model pretrainingfor biomedical natural language processing,”arXiv preprint arXiv:2007.15779, 2020.
[4] S. Mukherjee and A. H. Awadallah, “Xtremedistil: Multi-stage distillation for massive multilingual models,” inProceedings of the 58thAnnual Meeting of the Association for Computational Linguistics, 2020, pp. 2221–2234.
[5]R. Tang, Y. Lu, L. Liu, L. Mou, O. Vechtomova, and J. Lin, “Distilling task-specific knowledge from bert into simple neural networks,”arXiv preprint arXiv:1903.12136, 2019.
[8]B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” inProceedings of theACL conference on Empirical methods in natural language processing, 2002, pp. 79–86.
[9]R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts, “Recursive deep models for semantic compositionalityover a sentiment treebank,” inProceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631–1642.
[10]L. Deng and J. Wiebe, “Mpqa 3.0: An entity/event-level sentiment corpus,” inProceedings of the 2015 conference of the North Americanchapter of the association for computational linguistics: human language technologies, 2015, pp. 1323–1328.
[12]X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” inAdvances in neural informationprocessing systems, 2015, pp. 649–657.
[13] http://qwone.com/~jason/20Newsgroups/.
[14]C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to fine-tune bert for text classification?” inChina National Conference on Chinese ComputationalLinguistics. Springer, 2019, pp. 194–206.
[15] https://martin-thoma.com/nlp-reuters.
[16]F. Wang, Z. Wang, Z. Li, and J.-R. Wen, “Concept-based short text classification and ranking,” inProceedings of the 23rd ACM InternationalConference on Conference on Information and Knowledge Management. ACM, 2014, pp. 1069–1078.
[17]D. Greene and P. Cunningham, “Practical solutions to the problem of diagonal dominance in kernel document clustering,” inProc. 23rdInternational Conference on Machine learning (ICML’06). ACM Press, 2006, pp. 377–384.
[18]A. S. Das, M. Datar, A. Garg, and S. Rajaram, “Google news personalization: scalable online collaborative filtering,” inProceedings of the16th international conference on World Wide Web. ACM, 2007, pp. 271–280.
[19]J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Aueret al., “Dbpedia–alarge-scale, multilingual knowledge base extracted from wikipedia,”Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.
[21]E. L. Mencia and J. Fürnkranz, “Efficient pairwise multilabel classification for large-scale problems in the legal domain,” inJointEuropean Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2008, pp. 50–65.
[22]K. Kowsari, D. E. Brown, M. Heidarysafa, K. J. Meimandi, M. S. Gerber, and L. E. Barnes, “Hdltex: Hierarchical deep learning for textclassification,” in2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2017, pp. 364–371.
[23] Z. Lu, “Pubmed and beyond: a survey of web tools for searching biomedical literature,”Database, vol. 2011, 2011.
[24]F. Dernoncourt and J. Y. Lee, “Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts,”arXiv preprintarXiv:1710.06071, 2017.
[25]B. C. Wallace, L. Kertz, E. Charniaket al., “Humans require context to infer ironic intent (so computers probably do, too),” inProceedingsof the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014, pp. 512–516.
[26]P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,”arXiv preprintarXiv:1606.05250, 2016.
[27] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for squad,”arXiv preprint:1806.03822, 2018.
[28]T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng, “Ms marco: a human-generated machine readingcomprehension dataset,” 2016.
[29] https://cogcomp.seas.upenn.edu/Data/QA/QC/.
[30]Y. Yang, W.-t. Yih, and C. Meek, “Wikiqa: A challenge dataset for open-domain question answering,” inProceedings of the 2015 Conferenceon Empirical Methods in Natural Language Processing, 2015, pp. 2013–2018.
[32]R. Zellers, Y. Bisk, R. Schwartz, and Y. Choi, “Swag: A large-scale adversarial dataset for grounded commonsense inference,”arXivpreprint arXiv:1808.05326, 2018.
[33]T. Jurczyk, M. Zhai, and J. D. Choi, “Selqa: A new benchmark for selection-based question answering,” in2016 IEEE 28th InternationalConference on Tools with Artificial Intelligence (ICTAI). IEEE, 2016, pp. 820–827.
[34]S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,”arXiv preprintarXiv:1508.05326, 2015.
[35]A. Williams, N. Nangia, and S. R. Bowman, “A broad-coverage challenge corpus for sentence understanding through inference,”arXivpreprint arXiv:1704.05426, 2017.
[36]M. Marelli, L. Bentivogli, M. Baroni, R. Bernardi, S. Menini, and R. Zamparelli, “Semeval-2014 task 1: Evaluation of compositionaldistributional semantic models on full sentences through semantic relatedness and textual entailment,” inProceedings of the 8thinternational workshop on semantic evaluation (SemEval 2014), 2014, pp. 1–8.
[37]B. Dolan, C. Quirk, and C. Brockett, “Unsupervised construction of large paraphrase corpora: Exploiting massively parallel newssources,” inProceedings of the 20th international conference on Computational Linguistics. ACL, 2004, p. 350.
[38]D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia, “Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingualfocused evaluation,”arXiv preprint arXiv:1708.00055, 2017.
[39]I. Dagan, O. Glickman, and B. Magnini, “The PASCAL Recognising Textual Entailment Challenge,” inLecture Notes in Computer Science(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2006.
[40]T. Khot, A. Sabharwal, and P. Clark, “Scitail: A textual entailment dataset from science question answering,” in32nd AAAI Conferenceon Artificial Intelligence, AAAI 2018, 2018.