...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >The Interaction Between Schema Matching and Record Matching in Data Integration
【24h】

The Interaction Between Schema Matching and Record Matching in Data Integration

机译:数据集成中模式匹配和记录匹配之间的交互

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Schema Matching (SM) and Record Matching (RM) are two necessary steps in integrating multiple relational tables of different schemas, where SM unifies the schemas and RM detects records referring to the same real-world entity. The two processes have been thoroughly studied separately, but few attention has been paid to the interaction of SM and RM. In this work, we find that, even alternating them in a simple manner, SM and RM can benefit from each other to reach a better integration performance (i.e., in terms of precision and recall). Therefore, combining SM and RM is a promising solution for improving data integration. To this end, we define novel matching rules for SM and RM, respectively, that is, every SM decision is made based on intermediate RM results, and vice versa, such that SM and RM can be performed alternately. The quality of integration is guaranteed by a Matching Likelihood Estimation model and the control of semantic drift, which prevent the effect of mismatch magnification. To reduce the computational cost, we design an index structure based on q-grams and a greedy search algorithm that can reduce around 90 percent overhead of the interaction. Extensive experiments on three data collections show that the combination and interaction between SM and RM significantly outperforms previous works that conduct SM and RM separately.
机译:模式集成(SM)和记录匹配(RM)是集成不同模式的多个关系表的两个必要步骤,其中SM统一模式,RM检测引用同一真实世界实体的记录。分别对这两个过程进行了彻底的研究,但是对SM和RM的交互作用却很少关注。在这项工作中,我们发现,即使以简单的方式将它们交替使用,SM和RM也可以彼此受益,以达到更好的集成性能(即在准确性和召回率方面)。因此,结合使用SM和RM是改善数据集成的有前途的解决方案。为此,我们分别为SM和RM定义了新颖的匹配规则,也就是说,每个SM决策都是基于中间RM结果做出的,反之亦然,这样就可以交替执行SM和RM。匹配的似然估计模型和语义漂移控制可确保集成质量,从而防止不匹配放大的影响。为了减少计算成本,我们设计了基于q-gram和贪婪搜索算法的索引结构,该索引结构可以减少大约90%的交互开销。在三个数据集上进行的大量实验表明,SM和RM之间的组合和交互作用明显优于以前分别进行SM和RM的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号