Analyzing and Revising Data Integration Schemas to Improve Their Matchability

机译：分析和修订数据集成架构以提高其可匹配性

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data integration systems often provide a uniform query interface, called a mediated schema, to a multitude of data sources. To answer user queries, such systems employ a set of semantic matches between the mediated schema and the data-source schemas. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic techniques to efficiently find the matches. In this paper we consider the complementary problem of improving the mediated schem,a, to make finding such matches easier. Specifically, a mediated schema S will typically be matched with many source schemas. Thus, can the developer of S analyze and revise S in a way that preserves S 's semantics, and yet makes it easier to match with in the future?In this paper we provide an affirmative answer to the above question, and outline a promising solution direction, called mSeer. Given a mediated schema S and a matching tool M, mSeer first computes a matchability score that quantifies how well S can be matched against using M. Next, mSeer uses this score to generate a matchability report that identifies the problems in matching S. Finally, mSeer addresses these problems by automatically suggesting changes to S (e.g., renaming an attribute, reformatting data values, etc.) that it believes will preserve the semantics of S and yet make it more amenable to matching. We present extensive experiments over several real-world domains that demonstrate the promise of the proposed approach.

机译：数据集成系统通常向多个数据源提供统一的查询界面，称为中介模式。为了回答用户查询，这样的系统在中介模式与数据源模式之间采用了一组语义匹配。众所周知，找到这样的比赛是困难的。因此，许多工作集中在开发半自动技术以有效地找到匹配项上。在本文中，我们考虑了改进介导方案a的补充问题，以使查找此类匹配更加容易。具体而言，中介模式S通常将与许多源模式匹配。因此，S的开发人员能否以保留S语义的方式分析和修改S，但将来使其更易于匹配？在本文中，我们为上述问题提供了肯定的答案，并概述了一个有前途的解决方案方向，称为mSeer。给定中介模式S和匹配工具M，mSeer首先计算一个可匹配性得分，该得分量化使用M可以匹配S的程度。接下来，mSeer使用该得分来生成可匹配性报告，以识别匹配S的问题。 mSeer通过自动建议对S的更改（例如，重命名属性，重新格式化数据值等）来解决这些问题，它认为这将保留S的语义，但使其更易于匹配。我们在几个实际领域中进行了广泛的实验，这些实验证明了该方法的前景。

著录项

来源
《International conference on very large data bases;VLDB 2008》|2008年|772-783|共12页
会议地点
作者
Xiaoyong Chai; Mayssam Sayyadian; AnHai Doan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Automatic Domain Independent Schema Matching in Integrating Schemas of Heterogeneous Relational Databases [J] . Hamidah Ibrahim, Yaser Karasneh, Meghdad Mirabi, Journal of information science and engineering . 2014,第5期

机译：异构关系数据库集成中与域无关的自动模式匹配
2. Learning to Match the Schemas of Data Sources: A Multistrategy Approach [J] . ANHAI DOAN, PEDRO DOMINGOS, ALON HALEVY Machine Learning . 2003,第3期

机译：学习匹配数据源的架构：一种多策略方法
3. Integrating Related XML Data into Multiple Data Warehouse Schemas [J] . Soumya Sen, Ranak Ghosh, Debanjali Paul, Computer Science & Information Technology . 2012,第1期

机译：将相关的XML数据集成到多个数据仓库模式中
4. Analyzing and Revising Data Integration Schemas to Improve Their Matchability [C] . Xiaoyong Chai, Mayssam Sayyadian, AnHai Doan International conference on very large data bases . 2008

机译：分析和修改数据集成模式，以提高其可匹配性
5. Comparing semantic matching results of schema matchers and metadata registry enabled systems. [D] . Sabados, William Thomas. 2010

机译：比较模式匹配器和启用元数据注册表的系统的语义匹配结果。
6. PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data [O] . 2001

机译：PlasmoDB：集成数据库恶性疟原虫基因组。工具访问和分析完成和未完成的序列数据
7. Analyzing and Revising Data Integration Schemas to Improve Their Matchability [O] . Xiaoyong Chai et al. 2008

机译：分析和修订数据集成架构以提高其可匹配性

Analyzing and Revising Data Integration Schemas to Improve Their Matchability

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅