前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >利用深度学习识别 twitter 上的药物使用情况(CS SI)

利用深度学习识别 twitter 上的药物使用情况(CS SI)

原创
作者头像
用户7095611
修改2020-03-26 14:42:19
4350
修改2020-03-26 14:42:19
举报

社交媒体的收集和检测已经成为研究用户心理活动和行为倾向的有效机制。通过分析收集到的 twitter 数据,建立了分类与毒品有关的 tweets 的模型。 通过使用与主题相关的关键词,比如俚语和吸毒方法,生成了一组 tweets。然后对潜在的候选者进行预处理得到了3696150行的数据集。比较了支持向量机(SVM)、 XGBoost 和基于卷积神经网络(CNN)的多种分类器的分类能力 不是简单的特征或属性分析,而是采用深度学习方法来筛选和分析推文的语义含义。 与其他方法相比,两个基于CNN的分类器得到了最好的结果。第一组用2661个人工标记的样本进行训练,而另一组则包括综合生成的12142个样本的 tweets。 准确率分别为76.35% 和82.31%,AUC分别为0.90和0.91。 此外,关联规则挖掘表明,通常提到的药物与经常使用的非法物质有一定程度的对应关系,证明了该系统的实用性。 最后,综合生成集提供了更高的分数,提高了分类能力,并证明了这种方法的价值。

原文题目:Utilizing Deep Learning to Identify Drug Use on Twitter Data

原文:The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. Through the analysis of collected Twitter data, models were developed for classifying drug-related tweets. Using topic pertaining keywords, such as slang and methods of drug consumption, a set of tweets was generated. Potential candidates were then preprocessed resulting in a dataset of 3,696,150 rows. The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers. Rather than simple feature or attribute analysis, a deep learning approach was implemented to screen and analyze the tweets' semantic meaning. The two CNN-based classifiers presented the best result when compared against other methodologies. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Additionally, association rule mining showed that commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of the system. Lastly, the synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.

原文作者:Joseph Tassone

原文地址:https://arxiv.org/abs/2003.11522

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档