首页> 外文期刊>Procedia Computer Science >The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
【24h】

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

机译:阿拉伯语言论标签对情感分析的影响:一种新的语料库和深度学习方法

获取原文
           

摘要

Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing social media content to determine people’s opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics. The accuracy of sentiment analysis depends on automatic Part-of-Speech (PoS) tagging which is required to label words according to grammatical categories. The challenge of analyzing the Arabic language has found considerable research interest, but now the challenge is amplified with the addition of social media dialects. While numerous morphological analyzers and PoS taggers were proposed for Modern Standard Arabic (MSA), we are now witnessing an increased interest in applying those techniques to the Arabic dialect that is prominent in social media. Indeed, social media texts (e.g. posts, comments, and replies) differ significantly from MSA texts in terms of vocabulary and grammatical structure. Such differences call for reviewing the PoS tagging methods to adapt social media texts. Furthermore, the lack of sufficiently large and diverse social media text corpora constitutes one of the reasons that automatic PoS tagging of social media content has been rarely studied. In this paper, we address those limitations by proposing a novel Arabic social media text corpus that is enriched with complete PoS information, including tags, lemmas, and synonyms. The proposed corpus constitutes the largest manually annotated Arabic corpus to date, with more than 5 million tokens, 238,600 MSA texts, and words from Arabic social media dialect, collected from 65,000 online users’ accounts. Furthermore, our proposed corpus was used to train a custom Long Short-Term Memory deep learning model and showed excellent performance in terms of sentiment classification accuracy and F1-score. The obtained results demonstrate that the use of a diverse corpus that is enriched with PoS information significantly enhances the performance of social media analysis techniques and opens the door for advanced features such as opinion mining and emotion intelligence.
机译:通过使用自然语言处理(NLP)技术来实现情绪分析,并在分析社交媒体内容时发现广泛的应用,以确定人们对实体,个人,问题,事件或主题的观点,态度和情感。情绪分析的准确性取决于自动演讲(POS)标记,这是根据语法类别标记单词所必需的。分析阿拉伯语语言的挑战已经发现了相当大的研究兴趣,但现在挑战随着社交媒体方言的增加而放大。虽然为现代标准的阿拉伯语(MSA)提出了许多形态分析仪和POS标记,但我们现在目睹了将这些技术应用于社交媒体中突出的阿拉伯语方言的兴趣增加。实际上,社交媒体文本(例如,帖子,评论和回复)在词汇和语法结构方面与MSA文本有显着不同。此类差异呼叫审阅POS标记方法以适应社交媒体文本。此外,缺乏足够大型和多样化的社交媒体文本Corpora构成了社交媒体内容的自动POS标记已经很少研究的一个原因之一。在本文中,我们通过提出一种新的阿拉伯语社交媒体文本语料库来解决这些限制,这些语料库丰富了完整的POS信息,包括标签,lemmas和同义词。拟议的语料库构成了迄今为止最大的手动注释的阿拉伯语语料库,超过500万代币,238,600 MSA文本和来自阿拉伯社交媒体方言的文字,从65,000名在线用户的账户中收集。此外,我们提出的语料库被用来培训定制长期短期记忆深度学习模型,并在情绪分类准确度和F1分数方面表现出优异的性能。所获得的结果表明,使用丰富的POS信息的多样性语料库显着提高了社交媒体分析技术的性能,并打开了诸如意见采矿和情感智能等先进特征的门。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号