首页> 外文期刊>ACM transactions on Asian language information processing >Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining
【24h】

Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining

机译:单词很重要:通过词汇提炼改进波斯语中的情感分析

获取原文
获取原文并翻译 | 示例
       

摘要

Lexicon-based sentiment analysis (SA) aims to address the problem of extracting people's opinions from their comments on the Web using a predefined lexicon of opinionated words. In contrast to the machine learning (ML) approach, lexicon-based methods are domain-independent methods that do not need a large annotated training corpus and hence are faster. This makes the lexicon-based approach prevalent in the SA community. However, the story is different for the Persian language. In contrast to English, using the lexicon-based method in Persian is a new discipline. There are rather limited resources available for SA in Persian, making the accuracy of the existing lexicon-based methods lower than other languages. In the current study, first an exhaustive investigation of the lexicon-based method is performed. Then two new resources are introduced to address the problem of resource scarcity for SA in Persian: a carefully labeled lexicon of sentiment words, PerLex, and a new handmade dataset of about 16,000 rated documents, PerView. Moreover, a new hybrid method using both ML and the lexicon-based approach is presented in which PerLex words are used to train the ML algorithm. Experiments are carried out on our new PerView dataset. Results indicate that the accuracy of PerLex is higher than the existing CNRC, Adjectives, SentiStrength, PerSent, and LexiPers lexicons. In addition, the results show that using PerLex significantly decreases the execution time of the proposed system in comparison to the above-mentioned lexicons. Moreover, the results demonstrate the excellence of using opinionated lexicon terms followed by bigrams as the features employed in the ML method.
机译:基于词典的情感分析(SA)旨在解决使用预定义的带单词的词典从Web上的注释中提取人们意见的问题。与机器学习(ML)方法相反,基于词典的方法是领域无关的方法,不需要大型的带注释的训练语料库,因此速度更快。这使得基于词典的方法在SA社区中很普​​遍。但是,波斯语的故事则不同。与英语相反,在波斯语中使用基于词典的方法是一门新学科。波斯语中可用于SA的资源非常有限,这使现有的基于词典的方法的准确性低于其他语言。在当前的研究中,首先对基于词典的方法进行了详尽的研究。然后,引入了两个新资源来解决波斯语中SA的资源稀缺问题:一个经过仔细标记的情感词词典PerLex,以及一个包含约16,000个额定文档的新手工数据集PerView。此外,提出了一种同时使用ML和基于词典的方法的新混合方法,其中使用PerLex单词训练ML算法。实验是在新的PerView数据集上进行的。结果表明,PerLex的准确性高于现有的CNRC,形容词,SentiStrength,PerSent和LexiPers词典。此外,结果表明,与上述词典相比,使用PerLex显着减少了所提出系统的执行时间。此外,结果表明,使用自带词典词条后跟双字母组作为ML方法中使用的功能的出色之处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号