首页> 外文会议>Foundations of Information and Knowledge Systems >Skyline Cardinality for Relational Processing How Many Vectors Are Maximal?
【24h】

Skyline Cardinality for Relational Processing How Many Vectors Are Maximal?

机译:关系处理的天际线基数最大有几个向量?

获取原文

摘要

The skyline clause—also called the Pareto clause—recently has been proposed as an extension to SQL. It selects the tuples that are Pareto optimal with respect to a set of designated skyline attributes. This is the maximal vector problem in a relational context, but it represents a powerful extension to SQL which allows for the natural expression of on-line analytic processing (OLAP) queries and preferences in queries. Cardinality estimation of skyline sets is the focus in this work. A better understanding of skyline cardinality—and other properties of the skyline—is useful for better design of skyline algorithms, is necessary to extend a query optimizer's cost model to accommodate skyline queries, and helps to understand better how to use skyline effectively for OLAP and preference queries. Within a basic model with assumptions of sparseness of values on attributes' domains and statistical independence across attributes, we establish the expected skyline cardinality for skyline queries. While asymptotic bounds have been previously established, they are not widely known nor applied in skyline work. We show concrete estimates, as would be needed in a cost model, and consider the nature of the distribution of skyline. We next establish the effects on skyline cardinality as the constraints on our basic model are relaxed. Some of the results are quite counter-intuitive, and understanding these is critical to skyline's use in OLAP and preference queries. We consider when attributes' values repeat on their domains, and show the number of skyline is diminished. We consider the effects of having Zipfian distributions on the attributes' domains, and generalize the expectation for other distributions. Last, we consider the ramifications of correlation across the attributes.
机译:最近,有人提出了天际线子句(也称为Pareto子句)作为SQL的扩展。相对于一组指定的天际线属性,它选择帕累托最优的元组。这是关系上下文中的最大向量问题,但是它代表了SQL的强大扩展,可以自然表达在线分析处理(OLAP)查询和查询中的首选项。天际线集的基数估计是这项工作的重点。更好地了解天际线基数以及天际线的其他属性对于更好地设计天际线算法非常有用,对于扩展查询优化器的成本模型以适应天际线查询是必要的,并且有助于更好地了解如何有效地将天际线用于OLAP和偏好查询。在一个基本模型中,假设属性值域的稀疏性和属性间的统计独立性,我们为天际线查询建立了预期的天际线基数。虽然渐近界线已经预先确定,但是它们并没有广为人知,也没有应用于天际线工作中。我们显示了成本模型所需的具体估计,并考虑了天际线分布的性质。接下来,随着对基本模型的约束放宽,我们将确定对天际线基数的影响。其中一些结果是违反直觉的,理解这些结果对于在OLAP和首选项查询中使用skyline是至关重要的。我们考虑何时属性值在其域上重复,并显示天际线数量减少了。我们考虑了Zipfian分布对属性域的影响,并概括了对其他分布的期望。最后,我们考虑各个属性之间的相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号