首页> 中文期刊> 《计算机工程与应用》 >集成多种特征匹配中文实体名称

集成多种特征匹配中文实体名称

         

摘要

Entity name matching plays an important role in information system integration applications, while the name variations and clerical errors in Chinese entity names make exact string matching problematic. Therefore it is important to develop methodologies that can handle the different variants of the same name entity. The Chinese entity name similarity is measured based on character, word and semantic levels separately, and a hybrid solution is introduced by combining these similarities linearly. Two machine learning methods are developed to integrate editing features for more precise matching: the optimized ranking list and best cut point are achieved from a training process; a Support Vector Machine is used to judge the name pairs. The results of an experimental study on a real dataset of Chinese entity names are reported; the experiment results show the methods are effective.%准确匹配实体名称在信息系统集成中有广泛的应用,而在中文环境中,实体名称的变化和笔误使得中文实体名称难以准确匹配,所以需要开发出适应这些变化和笔误的匹配方法.中文实体名称的相似度从字、词、语义三个层次计算出来,将这些相似度线性合并起来,集成各自的优势.为了利用更多的匹配特征,引入了两种机器学习的方法:第一种方法通过训练获得一个优化排序和最佳切分点;第二种方法利用支持向量机来判断两个名称是否指向同一实体.在中文实体名称的数据集上的实验表明,这些方法和特征有效提高了匹配的效果.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号