首页> 外文会议>Proof of Designed Reliability >Corpus-based schema matching
【24h】

Corpus-based schema matching

机译:基于语料库的模式匹配

获取原文
获取原文并翻译 | 示例

摘要

Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have proposed a principled combination of multiple algorithms. However, these solutions sometimes perform rather poorly due to the lack of sufficient evidence in the schemas being matched. In this paper we show how a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better. Such a corpus typically contains multiple schemas that model similar concepts and hence enables us to learn variations in the elements and their properties. We exploit such a corpus in two ways. First, we increase the evidence about each element being matched by including evidence from similar elements in the corpus. Second, we learn statistics about elements and their relationships and use them to infer constraints that we use to prune candidate mappings. We also describe how to use known mappings to learn the importance of domain and generic constraints. We present experimental results that demonstrate corpus-based matching outperforms direct matching (without the benefit of a corpus) in multiple domains.
机译:模式匹配是在不同模式中标识相应元素的问题。本质上很难发现这些对应关系或匹配项。过去的解决方案提出了多种算法的原则组合。但是,由于在匹配的方案中缺乏足够的证据,因此这些解决方案有时效果很差。在本文中,我们展示了如何使用模式和映射的语料库来增加有关被匹配模式的证据,从而可以更好地进行匹配。这样的语料库通常包含对相似概念建模的多个模式,因此使我们能够学习元素及其属性的变化。我们以两种方式利用这种语料库。首先,我们通过在语料库中包含来自相似元素的证据来增加有关每个元素匹配的证据。其次,我们学习有关元素及其关系的统计信息,并使用它们来推断用于修剪候选映射的约束。我们还将描述如何使用已知的映射来了解域和通用约束的重要性。我们目前的实验结果表明,在多个领域中,基于语料库的匹配优于直接匹配(无语料库的好处)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号