首页> 外文学位 >A holistic paradigm for large scale schema matching.
【24h】

A holistic paradigm for large scale schema matching.

机译:大规模模式匹配的整体范例。

获取原文
获取原文并翻译 | 示例

摘要

Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise attribute correspondences in isolation. In contrast, this thesis proposes a new matching paradigm, holistic schema matching, to match many schemas at the same time and find all matchings at once. By handling a set of schemas together, we can explore their context information that reflects the semantic correspondences among attributes. Such information is not available when schemas are matched only in pairs. As the realizations of holistic schema matching, we develop two approaches in sequence. To begin with, we develop the MGS framework, which finds simple 1:1 matchings by viewing schema matching as hidden model discovery. Then, to deal with complex matchings, we further develop the DCM framework by abstracting schema matching as correlation mining. Further, to automate the entire matching process, we incorporate the DCM framework with automatically extracted interfaces and find that the inevitable errors in automatic interface extraction may significantly affect the matching result. To make the DCM framework robust against such "noisy" schemas, we propose to integrate it with an ensemble approach by randomizing the schema data into multiple DCM matchers and aggregating their ranked results by taking majority voting. Last, as our matching algorithms require large-scale schemas in the same domain (e.g., Books and Airfares) as input, we develop an object-focused crawler for effectively collecting query interfaces and a model-differentiation based clustering approach to clustering schemas into their domain hierarchy.
机译:模式匹配是集成异构信息源的关键问题。传统上,匹配多个模式的问题基本上依赖于孤立地查找成对属性对应。相反,本文提出了一种新的匹配范式,即整体模式匹配,以同时匹配多个模式并同时找到所有匹配。通过一起处理一组模式,我们可以探索它们的上下文信息,以反映属性之间的语义对应。当模式仅成对匹配时,此类信息不可用。随着整体模式匹配的实现,我们依次开发了两种方法。首先,我们开发了MGS框架,该框架通过将模式匹配视为隐藏的模型发现来找到简单的1:1匹配。然后,为了处理复杂的匹配,我们通过将模式匹配抽象为相关挖掘来进一步开发DCM框架。此外,为了使整个匹配过程自动化,我们将DCM框架与自动提取的接口合并在一起,发现自动接口提取中不可避免的错误可能会严重影响匹配结果。为了使DCM框架针对此类“嘈杂”模式具有鲁棒性,我们建议通过将模式数据随机分配到多个DCM匹配器中并通过进行多数投票汇总其排名结果,将其与整体方法集成。最后,由于我们的匹配算法需要在同一个域(例如,Books and Airfares)中作为输入的大规模模式,因此我们开发了一个以对象为中心的搜寻器,以有效地收集查询接口,并使用基于模型差分的聚类方法将模式聚类到其中域层次结构。

著录项

  • 作者

    He, Bin.;

  • 作者单位

    University of Illinois at Urbana-Champaign.;

  • 授予单位 University of Illinois at Urbana-Champaign.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 193 p.
  • 总页数 193
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号