首页> 外文会议>The 18th CSI International Symposium on Computer Science and Software Engineering >Translation is not enough: Comparing Lexicon-based methods for sentiment analysis in Persian
【24h】

Translation is not enough: Comparing Lexicon-based methods for sentiment analysis in Persian

机译:翻译是远远不够的:比较波斯语中基于词典的情感分析方法

获取原文
获取原文并翻译 | 示例

摘要

Sentiment analysis is a subfield of data mining and natural language processing with the aim of extracting people's opinion and appraisals from their comments on the Web. Contrary to machine learning approach, lexicon-based methods have some important advantages like domain-independency and being needless of a large annotated training corpus and hence are faster. This makes lexicon-based approach prevalent in the sentiment analysis community. However, for Persian language, in contrast to English, using lexicon-based method is a new discipline. There are limited lexicons available for sentiment analysis in Persian, almost all of them are directly translated from English. In the current study, four lexicons are compared to show the importance of lexicons in the performance of document-level sentiment analysis. Specifically, the Persian version of NRC lexicon, SentiStrength, CNRC, and Adjectives are compared in a pure lexicon-based scenario. Experiments are carried out on the document-level edition of SPerSent dataset. Results show that direct translation used in NRC leads the poorest performance while pre-processing and refining lexicons used in SentiStrength and CNRC improves the performance. Also, the results show that using just adjectives leads to higher results in comparison to using NRC.
机译:情感分析是数据挖掘和自然语言处理的一个子领域,其目的是从人们在Web上的评论中提取他们的意见和评价。与机器学习方法相反,基于词典的方法具有一些重要的优势,例如领域独立性,并且不需要大型的带注释的训练语料库,因此速度更快。这使得基于词典的方法在情感分析社区中盛行。但是,对于波斯语言,与英语相反,使用基于词典的方法是一门新学科。波斯语中用于情感分析的词典非常有限,几乎所有词典都直接从英语翻译而来。在当前的研究中,对四个词典进行了比较,以显示词典在文档级情感分析性能中的重要性。具体来说,在纯基于词典的方案中比较了NRC词典,SentiStrength,CNRC和形容词的波斯版本。实验是在SPerSent数据集的文档级版本上进行的。结果表明,NRC中使用的直接翻译导致最差的性能,而SentiStrength和CNRC中使用的预处理和精炼词典改善了性能。而且,结果表明,与使用NRC相比,仅使用形容词会带来更高的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号