首页> 外文会议>International conference on very large data bases;VLDB 2008 >Analyzing and Revising Data Integration Schemas to Improve Their Matchability
【24h】

Analyzing and Revising Data Integration Schemas to Improve Their Matchability

机译:分析和修订数据集成架构以提高其可匹配性

获取原文
获取外文期刊封面目录资料

摘要

Data integration systems often provide a uniform query interface, called a mediated schema, to a multitude of data sources. To answer user queries, such systems employ a set of semantic matches between the mediated schema and the data-source schemas. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic techniques to efficiently find the matches. In this paper we consider the complementary problem of improving the mediated schem,a, to make finding such matches easier. Specifically, a mediated schema S will typically be matched with many source schemas. Thus, can the developer of S analyze and revise S in a way that preserves S 's semantics, and yet makes it easier to match with in the future?In this paper we provide an affirmative answer to the above question, and outline a promising solution direction, called mSeer. Given a mediated schema S and a matching tool M, mSeer first computes a matchability score that quantifies how well S can be matched against using M. Next, mSeer uses this score to generate a matchability report that identifies the problems in matching S. Finally, mSeer addresses these problems by automatically suggesting changes to S (e.g., renaming an attribute, reformatting data values, etc.) that it believes will preserve the semantics of S and yet make it more amenable to matching. We present extensive experiments over several real-world domains that demonstrate the promise of the proposed approach.
机译:数据集成系统通常向多个数据源提供统一的查询界面,称为中介模式。为了回答用户查询,这样的系统在中介模式与数据源模式之间采用了一组语义匹配。众所周知,找到这样的比赛是困难的。因此,许多工作集中在开发半自动技术以有效地找到匹配项上。在本文中,我们考虑了改进介导方案a的补充问题,以使查找此类匹配更加容易。具体而言,中介模式S通常将与许多源模式匹配。因此,S的开发人员能否以保留S语义的方式分析和修改S,但将来使其更易于匹配? 在本文中,我们为上述问题提供了肯定的答案,并概述了一个有前途的解决方案方向,称为mSeer。给定中介模式S和匹配工具M,mSeer首先计算一个可匹配性得分,该得分量化使用M可以匹配S的程度。接下来,mSeer使用该得分来生成可匹配性报告,以识别匹配S的问题。 mSeer通过自动建议对S的更改(例如,重命名属性,重新格式化数据值等)来解决这些问题,它认为这将保留S的语义,但使其更易于匹配。我们在几个实际领域中进行了广泛的实验,这些实验证明了该方法的前景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号