首页> 外文会议>ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008 >A Unified Approach for Schema Matching, Coreference and Canonicalization
【24h】

A Unified Approach for Schema Matching, Coreference and Canonicalization

机译:模式匹配,共指和规范化的统一方法

获取原文

摘要

The automatic consolidation of database records from many heterogeneous sources into a single repository requires solving several information integration tasks. Although tasks such as coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation. Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a discriminatively-trained model that reasons about schema matching, coreference, and canonicalization jointly. We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or isolated approach. Our experiments show that a joint model is able to improve substantially over systems that either solve each task in isolation or with the conventional cascade. We demonstrate nearly a 50% error reduction for coreference and a 40% error reduction for schema matching.
机译:将来自许多异构源的数据库记录自动合并到单个存储库中需要解决多个信息集成任务。尽管诸如共同引用,模式匹配和规范化之类的任务紧密相关,但最常单独研究它们。传统上,解决多个集成问题的系统会独立解决每个问题,从而使错误从一项任务传播到另一项任务。在本文中,我们描述了一个经过区别训练的模型,该模型共同说明了模式匹配,共引用和规范化的原因。我们在真实的人员数据集上评估了我们的模型,并证明了同时解决这些任务可以减少级联或孤立方法的错误。我们的实验表明,联合模型能够大大改善系统的性能,该系统可以单独解决问题,也可以使用常规级联解决每个任务。我们证明了共引用的错误减少了近50%,模式匹配的错误减少了40%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号