首页> 中文期刊> 《计算机科学》 >基于Word2Vec的情感词典自动构建与优化

基于Word2Vec的情感词典自动构建与优化

         

摘要

情感词典的构建是文本挖掘领域中重要的基础性工作.近几年,情感词典的极性标注从二元褒贬标注向多元情绪标注发展,词典的领域特性也日趋明显.但是情感类别的手工标注不但费时费力,而且情感强度难以得到准确量化,同时对领域性的过分关注也大大限制了情感词典的适用性[1].通过神经网络语言模型对大规模中文语料进行统计训练,并在此基础上提出了基于转换约束集的多维情感词典自动构建方法;然后研究了基于词分布密度的感情色彩消歧方法,对兼具褒贬意味词语的感情极性进行区分和识别,并分别计算两种感情色彩下的情感类别与强度;最后提出基于多个语义资源的全局优化方案,得到包含10种情绪标注的多维汉语情感词典SentiRuc.实验证实该词典”在类别标注检验、强度标注检验、情感消歧效果及情感分类任务中均具有良好的效果,其中的情感强度检验证实该词典具有极强的情感语义描述力.%The construction of sentiment lexicon plays an important role in text mining.In recent years,the lexicon annotating format gradually evolves from binary annotation to multiple annotation,and sentiment lexicons of a single specific domain have caught more and more attentions of researchers.However,manual annotation costs too much labor work and time,and it is also difficult to get accurate quantification of emotional intensity.Besides,the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons[1].This paper implemented statistical training for large-scale Chinese corpus through neural network language model,and proposed an automatic method of constructing a multidimensional sentiment lexicon based on constraints of Euclidean distance group.In order to distinguish the sentiment polarities of those words which may express either positive or negative meanings in different contexts,we further presented a sentiment disambiguation algorithm to increase the flexibility of our lexicon.Lastly,we presented a global optimization framework that provides a unified way to combine several human-annotated resources for learning our 10-dimensional sentiment lexicon SentiRuc.Experiments show the superior performance of SentiRuc lexicon in category labeling test,intensity labeling test and sentiment classification tasks.It is worth mentioning that in intensity label test,SentiRuc outperforms the second place by 23 %.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号