首页> 外文期刊>ACM transactions on Asian language information processing >A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic
【24h】

A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic

机译:情感树库和形态学丰富的递归深度模型用于有效的阿拉伯语情感分析

获取原文
获取原文并翻译 | 示例

摘要

Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ARSENTB) that ismorphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.
机译:准确的情感分析模型对单词的情感及其组合进行编码,以预测句子的总体情感。当应用于形态丰富的语言(MRL)时,此任务变得具有挑战性。在本文中,我们评估了深度学习的进步,即递归神经张量网络(RNTN)的使用,以阿拉伯语作为MRL的案例研究进行情感分析。尽管阿拉伯语可能不被认为是所有最大残留限量的唯一代表,但阿拉伯语面临的挑战和拟议解决方案对许多其他最大残留限量而言是共同的。我们确定,说明并解决了与MRL相关的挑战,并说明了RNTN如何受到阿拉伯语的形态丰富性和正字歧义性的影响。为了解决从MRL中文本提取情感的挑战,我们建议在从原始词到词根的多个抽象级别上探索不同的正交特征以及不同的形态特征。 RNTN的关键要求是情感树库的可用性。语法分析树的集合,注释了所有支持者的各个级别的情感,目前仅以英语存在。因此,我们的贡献还包括创建第一个阿拉伯语情感树库(ARSENTB),其形态和字形上都得到了丰富。实验结果表明,与为英语提出的基本RNTN相比,我们的解决方案通过平均F1分数衡量,在短语级别的绝对值提高了8%,在句子级别的绝对值提高了10.8%,从而带来了显着的改进。它也比包括支持向量机,递归自动编码器和长短期记忆在内的知名分类器的绝对值分别高出7.6%,3.2%和1.6%,所有这些模型在训练时均采用了相似的形态学考量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号