首页> 外文会议>13th international conference on extending database technology 2010 >Subsumption and Complementation as Data Fusion Operators
【24h】

Subsumption and Complementation as Data Fusion Operators

机译:包含和补充作为数据融合运算符

获取原文
获取原文并翻译 | 示例

摘要

The goal of data fusion is to combine several representations of one real world object into a single, consistent representation, e.g., in data integration. A very popular operator to perform data fusion is the minimum union operator. It is defined as the outer union and the subsequent removal of subsumed tuples. Minimum union is used in other applications as well, for instance in database query optimization to rewrite outer join queries, in the semantic web community in implementing Sparql's optional operator, etc. Despite its wide applicability, there are only few efficient implementations, and until now, minimum union is not a relational database primitive.rnThis paper fills this gap as we present implementations of sub-sumption that serve as a building block for minimum union. Furthermore, we consider this operator as database primitive and show how to perform optimization of query plans in presence of sub-sumption and minimum union through rule-based plan transformations. Experiments on both artificial and real world data show that our algorithms outperform existing algorithms used for subsumption in terms of runtime and they scale to large volumes of data.rnIn the context of data integration, we observe that performing data fusion calls for more than subsumption and minimum union. Therefore, another contribution of this paper is the definition of the complementation and complement union operators. Intuitively, these allow to merge tuples that have complementing values and thus eliminate unnecessary null-values.
机译:数据融合的目的是例如在数据集成中将一个现实世界对象的几种表示组合成单个一致的表示。最小联合运算符是一种非常流行的执行数据融合的运算符。它定义为外部联合和随后删除的已包含的元组。最小并集还用于其他应用程序中,例如在数据库查询优化中重写外部联接查询,在语义Web社区中实现Sparql的可选运算符等。尽管它具有广泛的适用性,但只有很少的有效实现,直到现在,最小工会不是关系数据库的原始语言。本文填补了这一空白,因为我们介绍了作为最小工会构建模块的子消费实现。此外,我们将此运算符视为数据库基元,并说明如何通过基于规则的计划转换在存在子包含量和最小并集的情况下执行查询计划的优化。在人工和现实数据上的实验表明,我们的算法在运行时方面优于现有的包含算法,并且可以扩展到大量数据。在数据集成的上下文中,我们发现执行数据融合不仅需要包含和最低工会。因此,本文的另一贡献是补码和补码联合算子的定义。直观地讲,这些允许合并具有互补值的元组,从而消除不必要的空值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号