The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Abdul Munem Nerabie; Manar AlKhatib; Sujith Samuel Mathew; May El Barachi; Farhad Oroumchian

首页> 外文期刊>Procedia Computer Science >The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

【24h】

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

机译：阿拉伯语言论标签对情感分析的影响：一种新的语料库和深度学习方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing social media content to determine people’s opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics. The accuracy of sentiment analysis depends on automatic Part-of-Speech (PoS) tagging which is required to label words according to grammatical categories. The challenge of analyzing the Arabic language has found considerable research interest, but now the challenge is amplified with the addition of social media dialects. While numerous morphological analyzers and PoS taggers were proposed for Modern Standard Arabic (MSA), we are now witnessing an increased interest in applying those techniques to the Arabic dialect that is prominent in social media. Indeed, social media texts (e.g. posts, comments, and replies) differ significantly from MSA texts in terms of vocabulary and grammatical structure. Such differences call for reviewing the PoS tagging methods to adapt social media texts. Furthermore, the lack of sufficiently large and diverse social media text corpora constitutes one of the reasons that automatic PoS tagging of social media content has been rarely studied. In this paper, we address those limitations by proposing a novel Arabic social media text corpus that is enriched with complete PoS information, including tags, lemmas, and synonyms. The proposed corpus constitutes the largest manually annotated Arabic corpus to date, with more than 5 million tokens, 238,600 MSA texts, and words from Arabic social media dialect, collected from 65,000 online users’ accounts. Furthermore, our proposed corpus was used to train a custom Long Short-Term Memory deep learning model and showed excellent performance in terms of sentiment classification accuracy and F1-score. The obtained results demonstrate that the use of a diverse corpus that is enriched with PoS information significantly enhances the performance of social media analysis techniques and opens the door for advanced features such as opinion mining and emotion intelligence.

机译：通过使用自然语言处理（NLP）技术来实现情绪分析，并在分析社交媒体内容时发现广泛的应用，以确定人们对实体，个人，问题，事件或主题的观点，态度和情感。情绪分析的准确性取决于自动演讲（POS）标记，这是根据语法类别标记单词所必需的。分析阿拉伯语语言的挑战已经发现了相当大的研究兴趣，但现在挑战随着社交媒体方言的增加而放大。虽然为现代标准的阿拉伯语（MSA）提出了许多形态分析仪和POS标记，但我们现在目睹了将这些技术应用于社交媒体中突出的阿拉伯语方言的兴趣增加。实际上，社交媒体文本（例如，帖子，评论和回复）在词汇和语法结构方面与MSA文本有显着不同。此类差异呼叫审阅POS标记方法以适应社交媒体文本。此外，缺乏足够大型和多样化的社交媒体文本Corpora构成了社交媒体内容的自动POS标记已经很少研究的一个原因之一。在本文中，我们通过提出一种新的阿拉伯语社交媒体文本语料库来解决这些限制，这些语料库丰富了完整的POS信息，包括标签，lemmas和同义词。拟议的语料库构成了迄今为止最大的手动注释的阿拉伯语语料库，超过500万代币，238,600 MSA文本和来自阿拉伯社交媒体方言的文字，从65,000名在线用户的账户中收集。此外，我们提出的语料库被用来培训定制长期短期记忆深度学习模型，并在情绪分类准确度和F1分数方面表现出优异的性能。所获得的结果表明，使用丰富的POS信息的多样性语料库显着提高了社交媒体分析技术的性能，并打开了诸如意见采矿和情感智能等先进特征的门。

著录项

来源
《Procedia Computer Science》 |2021年第1期|共8页
作者
Abdul Munem Nerabie; Manar AlKhatib; Sujith Samuel Mathew; May El Barachi; Farhad Oroumchian;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Sentiment AnalysisPart of Speech TaggingArabic LanguageDialect ArabicNeural Network;

机译：语音标记的语言分析言语宣传汉语网络;

相似文献

外文文献
中文文献
专利

1. Subword Attentive Model for Arabic Sentiment Analysis: A Deep Learning Approach [J] . Beseiso Majdi, Elmousalami Haytham ACM transactions on Asian and low-resource language information processing . 2020,第2期

机译：阿拉伯语情绪分析的子字分级模型：深入学习方法
2. Deep learning in Arabic sentiment analysis: An overview [J] . Amal Alharbi, Mounira Taileb, Manal Kalkatawi Journal of Information Science . 2021,第1期

机译：在阿拉伯语情绪分析中深入学习：概述
3. A Novel Deep Learning-Based Multilevel Parallel Attention Neural (MPAN) Model for Multidomain Arabic Sentiment Analysis [J] . Mohammed A. El-Affendi, Khawla Alrajhi, Amir Hussain Quality Control, Transactions . 2021,第1期

机译：基于深度学习的多级并行关注神经（MPAN）模型，用于多域阿拉伯语情绪分析
4. MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis [C] . Ibrahim Hossam S., Abdou Sherif M., Gheith Mervat 2015 IEEE 2nd International Conference on Recent Trends in Information Systems . 2015

机译：MIKA：用于现代标准阿拉伯语和口语情感分析的标记语料库
5. Improving Sentiment Classification for Arabic Short Text Using Deep Learning Approaches [D] . Alwehaibi, Ali. 2021

机译：利用深度学习方法改善阿拉伯语短文本的情感分类
6. Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech [O] . Mohammed Algabri, Hassan Mathkour, Mansour M. Alsulaiman, 2021

机译：基于深入的学习的阿拉伯语和英语演讲中的明晰度特征检测
7. A powerful comparison of deep learning frameworks for Arabic sentiment analysis [O] . Youssra Zahidi, Yacine El Younoussi, Yassine Al-Amrani 2021

机译：深度学习框架对阿拉伯语情绪分析的强大比较

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

摘要

著录项

相似文献

相关主题

期刊订阅