首页> 中文期刊> 《计算机科学与探索》 >基于主题聚类的情感极性判别方法

基于主题聚类的情感极性判别方法

         

摘要

Almost all state-of-art methods for sentiment analysis can hardly avoid extracting sentiment features and applying them to classifiers for detecting. However, with the characteristics of diversity expressions and scattered themes of network texts, it’s too difficult to extract more suitable and proper sentiment features. This paper proposes a novel algorithm to solve such problems. Firstly, original texts need to be clustered by topics with LDA (latent Dirichlet allocation) model. Then, for each topic dataset, language models are trained for positive and negative sam-ples by using recurrent neural network. Finally, two kinds of probabilities of topic and sentiment are combined for evaluating text sentiment polarity. Through this method, this paper firstly standardizes text expression by dividing subcategories, limiting changes of words meaning under different topics, and then utilizes language model to avoid the difficulty of extracting features, making it possible to be internalized in the process of training model. The exper-imental results on IMDB show that the proposed method improves a lot in terms of accuracy with topic clustering.%目前,大多数方法在判别文本情感极性上采用的是提取情感特征并应用分类器进行分类的方式。然而由于网络文本表述方式多样,主题分散等特点,使得情感特征提取过程变得愈发困难。借助LDA(latent Dirichlet allocation)主题模型,首先对文本进行主题聚类,然后在每个主题子类上应用循环神经网络的方法对正、负情感样本分别建立主题模型,最后基于所属主题和所属情感的概率进行联合判断。采用这种方法,通过划分子类的方式规整了不同主题下文本的表述方式,限制了不同主题下词汇词义改变的问题,并且利用训练语言模型的方法很好地规避了直接提取特征的困难,将特征的挖掘过程内化在了训练模型的过程中。通过在IMDB电影评论样本上的实验可以看出,在应用了主题聚类后,模型分类的准确性有了显著提高。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号