首页> 外文期刊>Data & Knowledge Engineering >Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization
【24h】

Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization

机译:跨异构数据源的实体匹配:基于约束级联泛化的方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

To integrate or link the data stored in heterogeneous data sources, a critical problem is entity matching, i.e., matching records representing semantically corresponding entities in the real world, across the sources. While decision tree techniques have been used to learn entity matching rules, most decision tree learners have an inherent representational bias, that is, they generate univariate trees and restrict the decision boundaries to be axis-orthogonal hyper-planes in the feature space. Cascading other classification methods with decision tree learners can alleviate this bias and potentially increase classification accuracy. In this paper, the authors apply a recently-developed constrained cascade generalization method in entity matching and report on empirical evaluation using real-world data. The evaluation results show that this method outperforms the base classification methods in terms of classification accuracy, especially in the dirtiest case.
机译:为了集成或链接存储在异构数据源中的数据,关键问题是实体匹配,即,跨源匹配表示现实世界中语义上对应的实体的记录。虽然决策树技术已用于学习实体匹配规则,但大多数决策树学习器具有固有的表示偏差,即,它们生成单变量树并将决策边界限制为特征空间中的轴正交超平面。与决策树学习者一起使用其他分类方法可以减轻这种偏见,并有可能提高分类准确性。在本文中,作者将最近开发的约束级联泛化方法应用到实体匹配中,并使用实际数据报告了经验评估。评估结果表明,该方法在分类准确度方面优于基本分类方法,尤其是在最脏的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号