首页> 外文期刊>Procedia Computer Science >AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets
【24h】

AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets

机译:AraSenTi-Tweet:沙特推文阿拉伯语情感分析的语料库

获取原文
       

摘要

Arabic Sentiment Analysis is an active research area these days. However, the Arabic language still lacks sufficient language resources to enable the tasks of sentiment analysis. In this paper, we present the details of collecting and constructing a large dataset of Arabic tweets. The techniques used in cleaning and pre-processing the collected dataset are explained. A corpus of Arabic tweets annotated for sentiment analysis was extracted from this dataset. The corpus consists mainly of tweets written in Modern Standard Arabic and the Saudi dialect. The corpus was manually annotated for sentiment. The annotation process is explained in detail and the challenges during the annotation are highlighted. The corpus contains 17,573 tweets labelled with four labels for sentiment: positive, negative, neutral and mixed. Baseline experiments were conducted to provide benchmark results for future work.
机译:如今,阿拉伯语情绪分析是一个活跃的研究领域。但是,阿拉伯语仍然缺乏足够的语言资源来执行情感分析任务。在本文中,我们介绍了收集和构建大型阿拉伯语推文数据集的详细信息。清理和预处理收集的数据集的技术进行了说明。从该数据集中提取了注释为情感分析的阿拉伯语推文语料库。语料库主要包含以现代标准阿拉伯语和沙特方言编写的推文。对该语料库进行了手动注释以表达情感。详细说明了注释过程,并突出了注释过程中的挑战。语料库包含17,573条推文,标有四个情感标签:正面,负面,中立和混合。进行了基线实验,以为将来的工作提供基准结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号