Twitter是最受影响的社交媒体之一,拥有数百万活跃用户。它通常用于微博,允许用户分享消息、想法、想法等。因此,数以百万计的互动,如短信或推特,在推特用户之间流动,讨论世界各地发生的各种话题。这项研究旨在分析用户对某一特定热门话题的情绪,该话题在当时已被广泛积极地讨论过。我们选择了一个标签#kpujangancurang,这是2019年印度尼西亚总统选举期间的热门话题。我们使用标签从推特上获得一组数据,以分析和进一步调查用户从他们的推特上的积极或消极情绪。本研究利用快速挖掘工具生成微博数据,比较朴素贝叶斯、最近邻、决策树和多层感知器分类方法对微博数据进行情感分类。这个实验总共有200个标记数据。总体而言,朴素贝叶斯和多层感知器分类在11个不同大小的训练测试数据分割的实验中优于其他两种方法。这两个分类器有可能用于为小语料库的低资源语言创建情感分析器。
原文题目:Towards A Sentiment Analyzer for Low-Resource Languages
原文:Twitter is one of the top influenced social media which has a million number of active users. It is commonly used for microblogging that allows users to share messages, ideas, thoughts and many more. Thus, millions interaction such as short messages or tweets are flowing around among the twitter users discussing various topics that has been happening world-wide. This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time. We chose a hashtag #kpujangancurang that was the trending topic during the Indonesia presidential election in 2019. We use the hashtag to obtain a set of data from Twitter to analyse and investigate further the positive or the negative sentiment of the users from their tweets. This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data. There are overall 200 labeled data in this experiment. Overall, Naive Bayes and Multi-Layer Perceptron classification outperformed the other two methods on 11 experiments with different size of training-testing data split. The two classifiers are potential to be used in creating sentiment analyzer for low-resource languages with small corpus.
原文作者:Dian Indriani, Arbi Haza Nasution, Winda Monika,Salhazan Nasution
原文地址:https://arxiv.org/abs/2011.06382
原创声明,本文系作者授权云+社区发表,未经许可,不得转载。
如有侵权,请联系 yunjia_community@tencent.com 删除。
我来说两句