首页> 外文期刊>ACM transactions on Asian language information processing >CLASENTI: A Class-Specific Sentiment Analysis Framework
【24h】

CLASENTI: A Class-Specific Sentiment Analysis Framework

机译:CLASENTI:特定类别的情感分析框架

获取原文
获取原文并翻译 | 示例
       

摘要

Arabic text sentiment analysis suffers from low accuracy due to Arabic-specific challenges (e.g., limited resources, morphological complexity, and dialects) and general linguistic issues (e.g., fuzziness, implicit sentiment, sarcasm, and spam). The limited resources problem requires efforts to build new and improved Arabic corpora and lexica. We propose a class-specific sentiment analysis (CLASENTI) framework. The framework includes a new annotation approach to build multi-faceted Arabic corpus and lexicon allowing for simultaneous annotation of different facets, including domains, dialects, linguistic issues, and polarity strengths. Each of these facets has multiple classes (e.g., the nine classes representing dialects found in the Arab world). The new corpus and lexicon annotations facilitate the development of new class-specific classification models and polarity strength calculation. For the new sentiment classification models, we propose a hybrid model combining corpus-based and lexicon-based models. The corpus-based model has two interrelated phases to build; (1) full-corpus classification models for all facets; and (2) class-specific models trained on filtered subsets of the corpus according to the performances of the full-corpus models. To calculate polarity strengths, the lexicon-based model filters the annotated lexicon based on the specific classes of the domain and dialect. As a case study, we collect and annotate 15274 reviews from various sources, including surveys, Facebook comments, and Twitter posts, pertaining to governmental services. In addition, we develop a new web-based application to apply the proposed framework on the case study. CLASENTI framework reaches up to 95% accuracy and 93% F1-Score surpassing the best-known sentiment classifiers implemented in Scikit-learn library that achieve 82% accuracy and 81% F1-Score for Arabic when tested on the same dataset.
机译:由于阿拉伯语特定的挑战(例如,资源有限,形态复杂性和方言)和一般的语言问题(例如,模糊性,内在情绪,讽刺和垃圾邮件),阿拉伯语文本情感分析的准确性较低。有限的资源问题需要努力建立新的和改进的阿拉伯语语料库和词典。我们提出了特定于类别的情感分析(CLASENTI)框架。该框架包括一种新的注释方法,用于构建多方面的阿拉伯语语料库和词典,允许同时注释不同的方面,包括域,方言,语言问题和极性优势。每个方面都有多个类别(例如,代表阿拉伯世界中发现的方言的9个类别)。新的语料库和词典注释促进了新的特定于类的分类模型和极性强度计算的开发。对于新的情感分类模型,我们提出了一种结合基于语料库和基于词典的模型的混合模型。基于语料库的模型有两个相互关联的阶段需要建立。 (1)所有方面的全语料分类模型; (2)根据全语料库模型的性能,在过滤后的语料库子集上训练的特定于类的模型。要计算极性强度,基于词典的模型会根据域和方言的特定类别过滤带注释的词典。作为案例研究,我们从各种来源收集并注释了15274条与政府服务有关的评论,包括调查,Facebook评论和Twitter帖子。此外,我们开发了一个新的基于Web的应用程序,以将建议的框架应用于案例研究。 CLASENTI框架的准确率达到了95%,F1-Score达到了93%,超过了Scikit-learn库中实现的最知名的情感分类器。在同一数据集上进行测试时,阿拉伯文的阿拉伯文F1-Score的准确度达到82%,F1-Score达到81%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号