首页> 外文会议>International Conference on Signal-Image Technology and Internet- Based Systems >Scalability of Source Identification in Data Integration Systems
【24h】

Scalability of Source Identification in Data Integration Systems

机译:数据集成系统中源识别的可扩展性

获取原文

摘要

Given a large number of data sources, each of them being indexed by attributes from a predefined set A and given a query q over a subset Q of A with size k attributes, we are interested in identifying the set of all possible combinations of sources such that the union of their attributes covers Q. Each combination c may lead to a rewriting of q as a join over the sources in c. Furthermore, to limit redundancy and combinatorial explosion, we want the combination of sources to produce a minimal cover of Q. Although motivated by query rewriting in OpenXView [3], an XML data integration system with a large number of XML sources, we believe that the solutions provided in this paper apply to other scalable data integration schemes. In this paper we focus on the cases where the number of sources is very large, while the size of queries is small. We propose a novel algorithm for the computation of the set of minimal covers of a query and experimentally evaluate its performance.
机译:给定大量数据源,每个数据源由属性从预定义的集合a索引,并且给定query q与大小k属性的子集q,我们有兴趣识别源的所有可能组合的集合他们的属性联盟涵盖了Q.每个组合C都可能导致Q的重写作为C中的源。此外,为了限制冗余和组合爆炸,我们希望源的组合产生Q的最小封面。虽然通过OpenXview [3]中的查询重写激励,但是具有大量XML源的XML数据集成系统,我们相信本文提供的解决方案适用于其他可扩展的数据集成方案。在本文中,我们专注于来源数量非常大的情况,而查询的大小很小。我们提出了一种新颖的算法,用于计算查询的最小封面集并通过实验评估其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号