首页> 外文会议>International conference on brain-inspired cognitive systems >A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter
【24h】

A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter

机译:使用Twitter的半监督语料库注释,用于沙特阿拉伯情绪分析

获取原文

摘要

In the literature, limited work has been conducted to develop sentiment resources for Saudi dialect. The lack of resources such as dialectical lexicons and corpora are some of the major bottlenecks to the successful development of Arabic sentiment analysis models. In this paper, a semi-supervised approach is presented to construct an annotated sentiment corpus for Saudi dialect using Twitter. The presented approach is primarily based on a list of lexicons built by using word embedding techniques such as word2vec. A huge corpus extracted from twitter is annotated and manually reviewed to exclude incorrect annotated tweets which is publicly available. For corpus validation, state-of-the-art classification algorithms (such as Logistic Regression, Support Vector Machine, and Naive Bayes) are applied and evaluated. Simulation results demonstrate that the Naive Bayes algorithm outperformed all other approaches and achieved accuracy up to 91%.
机译:在文献中,为沙特方言开发情感资源的工作很少。诸如辩证词典和语料库之类的资源不足是成功开发阿拉伯语情感分析模型的主要瓶颈。在本文中,提出了一种半监督方法,使用Twitter为沙特方言构建带注释的情感语料库。提出的方法主要基于通过使用词嵌入技术(例如word2vec)构建的词典列表。对从twitter提取的巨大语料进行批注并进行手动检查,以排除公开可用的不正确批注的tweet。对于语料库验证,应用和评估了最新的分类算法(例如Logistic回归,支持向量机和朴素贝叶斯)。仿真结果表明,朴素贝叶斯算法优于其他所有方法,其准确率高达91%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号