首页> 外文期刊>Engineering and Applied Science Research >Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
【24h】

Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension

机译:与亚洲的麻醉相关的Tweet分类使用句子向量的单词嵌入与功能扩展名

获取原文
       

摘要

Currently, Asia faces a narcotic drug addiction problem. In social networking services, such as Twitter, some drug addicted users converse about behaviours related to narcotic drugs. This research proposes a new Narcotic-related Tweet Classification Model (NTCM) that uses data preprocessing. Two new data preprocessing methods, Sentence Vector of Word Embedding (SVWE) and Sentence Vector of Word Embedding with Feature Extension (SWEF), are introduced to prepare data for the NTCM. The proposed data preprocessing method uses the reduction of the dataset to produce an SVWE. Word embedding is generated by deep neural networks using the skip-gram model. The authors further extended some features to SVWE to produce a new dataset called SWEF; these datasets were used for the dataset in the NTCM. The authors collected data with keywords related to narcotic drugs from Twitter in Asia. The authors investigated a text classification model using a Support Vector Machine, Logistic Regression, a Decision Tree, and a Convolutional Neural Network. Logistic Regression with the SWEF provided the best approach for the NTCM compared with state-of-the-art methods. The proposed NTCM showed correctness and fitness by accuracy (0.8964), F-Measure (0.895), AUC (0.949), Kappa (0.7131), MCC (0.714), and low running time performance (1.04 seconds).
机译:目前,亚洲面临着麻醉药物成瘾问题。在诸如Twitter等社交网络服务中,一些药物上瘾的用户对与麻醉药物相关的行为相反。本研究提出了一种使用数据预处理的新麻醉相关的Tweet分类模型(NTCM)。引入了两个新的数据预处理方法,单词嵌入(SVWE)的句子向量和使用功能扩展(SWEF)的单词嵌入的句子向量,为NTCM的准备数据。所提出的数据预处理方法使用数据集的减少来产生SVWE。使用Skip-Gram模型由深神经网络生成单词嵌入。作者进一步扩展了SVWE的一些功能,以生产名为SWEF的新数据集;这些数据集用于NTCM中的数据集。作者收集了与亚洲推特与麻醉药有关的关键词的数据收集了数据。作者使用支持向量机,Logistic回归,决策树和卷积神经网络调查了文本分类模型。与SWEF的Logistic回归为NTCM的最佳方法提供了与最先进的方法相比。所提出的NTCM通过精度(0.8964),F测量(0.895),AUC(0.949),Kappa(0.7131),MCC(0.714)和低运行时间性能(1.04秒),表现出正确性和健康状况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号