首页> 中文期刊> 《计算机应用研究》 >基于混合余弦相似度的中文文本层次关系挖掘

基于混合余弦相似度的中文文本层次关系挖掘

         

摘要

层次关系是中文文本概念间存在的最为重要的关系之一.对层次关系的正确判定是进行领域本体自动构建、文本数据挖掘等信息处理的基础研究内容.先将概念间可能存在的候选层次关系罗列出来,构建词性序列语义余弦相似度和关系词语余弦相似度混合的核函数分类器,将概念间层次关系的挖掘问题转换为分类问题;再通过对文本数据进行模板标注来训练分类器;最后输入预处理后的中文文本,使用核函数分类器对候选层次关系进行判定.以空军武器装备领域的中文文本为测试数据,通过实验表明,该方法简单可靠,具有较好的正确率和召回率.%Hierarchy relation was one of the most important relationships between the Chinese text concepts.The correct determination of the hierarchical relationship was the basic research content of the domain ontology automatic construction and text data mining and so on.Firstly,this paper listed the possibly candidate hierarchy relations,and constructed a kernel function classifier which was based on the semantic cosine similarity of part-of-speech semantic sequence and relation words.Mining problems could be transformed into a hierarchy of classification.Then it trained the classifier by the manual template.Finally,it entered the Chinese text into the preprocessed,using the kernel function classifier to determine the relationship between the candidate hierarchy relations.Using the Chinese text in the field of Air Force Weapons and equipment as the test data,experiments show that the method is simple and reliable,with good accuracy and recall rate.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号