首页> 中文期刊> 《计算机与现代化》 >基于半监督协同训练的百科知识库实体对齐

基于半监督协同训练的百科知识库实体对齐

         

摘要

针对传统实体对齐方法中的有监督学习算法依赖大量标注数据,以及特征表示不适用于百科知识库等问题,提出一种基于半监督协同训练的实体对齐方法.将实体对齐建模为一个带约束的二分类问题,充分利用实体名、属性、描述文本及其中的时间、数值等关键信息,组合生成多维特征;将特征划分为2个相对独立的视图,通过2个视图上分类器的协同训练,迭代地从未标注数据中学习同义实体的分布情况.在2个中文百科上的实验结果表明,使用半监督协同训练方法进行实体对齐的F1值达到84.3%,较其他方法效果最优,证明了其有效性和在百科知识库上的实用价值.%Traditional supervised learning algorithms of instance alignment depend on large amounts of labeled data,and the feature representation methods are not suitable for data in encyclopedia.In view of these issues,a semi-supervised co-training instance alignment method is proposed.Instance alignment is modeled as a constrained binary classification problem.Then multiple features are extracted by fully utilizing different categories of existing information,including instance names,attributes,description texts and the critical discrete values extracted from the texts,such as temporal and numerical values.The features are divided into two relatively independent views,and two models are trained interactively on these two views,in order to learn more about the distribution of synonymous instances from the unlabeled data iteratively.Experimental results between two Chinese encyclopedia datasets show that the proposed method achieves a 84.3% F1-value on aligning instances,and outperforms other comparative methods,proving the effectiveness and applicability of the semi-supervised co-training instance alignment method.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号