首页> 外文会议>International conference on brain-inspired cognitive systems >SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis
【24h】

SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

机译:SentiALG:用于阿尔及利亚情绪分析的自动语料库注释

获取原文

摘要

Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (A Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.
机译:数据注释是一个重要但耗时且昂贵的过程。要将文本分为两类,我们需要的第一件事是一个好的注释准则,该准则确定了每个类别要具备的条件。在文献中,与适当的数据注释相关的困难被低估了。在本文中,我们提出了一种新颖的方法,可以自动为阿尔及利亚方言(Maghrebi阿拉伯方言)构建带注释的情感语料库。该语料库的构建基于同样自动构建的阿尔及利亚情感词典。呈现的作品涉及阿拉伯语社交媒体上两个广泛使用的脚本:阿拉伯语和阿拉伯语。所提出的方法自动构建包含8000条消息的情感语料库(其中4000条专用于阿拉伯语,4000条专用于阿拉伯语)。对于阿拉伯语和阿拉伯语测试集,达到的F1分数分别高达72%和78%。正在进行的工作旨在整合阿拉伯语消息的音译过程,以进一步改善获得的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号