首页> 中文期刊> 《计算机技术与发展》 >基于信息熵的新的词语相似度算法研究

基于信息熵的新的词语相似度算法研究

         

摘要

The words similarity computation is widely used in the area of natural language processing. In this paper,based on the research of words,concepts and sememe in HowNet,a new algorithm of word similarity based on information entropy is proposed. Firstly,similari-ty of words surface is led in this paper for selecting words from words set reasonably. Secondly,weight of each sememe would be bal-anced on the basis of information entropy to inhibition that common sememe would be much more than others in the sememe set what would result in obvious error comparing with physical truth. Experimental results show that compared with traditional methods,the unrea-sonable result like 1. 000 is no-show,which means that the result is rational. In addition,this experiment is based on words set instead of two words,which means that the method is more efficient.%针对词语相似度计算中结果合理性的问题,文中基于对“知网”中词语、义项和义原三个层次概念的研究,提出一种结合信息论研究中熵的概念的新的词语相似度方法。首先是引入词表相似度计算对词语集进行合理选取,再根据义原信息熵对各义原进行权重上的平衡,抑制一些常见义原在词语的义原集中比重过大而导致计算结果与真实情况相比出现明显误差的情况。实验结果表明,与传统方法相比,文中方法在实验并未出现1.000这样过于绝对的结果,提高了结果的合理性;并且实验词语集而非两词语之间,说明比较的效率也得到了提高。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号