首页> 外文会议>International conference on database theory >Detecting and Exploiting Near-Sortedness for Efficient Relational Query Evaluation
【24h】

Detecting and Exploiting Near-Sortedness for Efficient Relational Query Evaluation

机译:检测和利用高效关系查询评估的近分类

获取原文

摘要

Many relational operations are best performed when the relations are stored sorted over the relevant attributes (e.g. the common attributes in a natural join operation). However, generally relations are not stored sorted because it is expensive to maintain them this way (and impossible whenever there is more than one relevant sort key). Still, many times relations turn out to be nearly-sorted, where most tuples are close to their place in the order. This state can result from "leftover sortedness", where originally sorted relations were updated, or were combined into interim results when evalu ating a complex query. It can also result from weak correlations between attribute values. Currently, nearly-sorted relations are treated the same as unsorted relations, and when relational operations are evaluated for them, a generic algorithm is used. Yet, many operations can be computed more efficiently by an algorithm that exploits this near-ordering. However, to consistently benefit from using such algorithms the system should also refrain from using the wrong algorithm for relations which happen not to be sorted at all. Thus, an efficient test is required, i.e., a very fast approximation algorithm for establishing whether a given relation is sufficiently nearly-sorted. In this paper, we provide the theoretical foundations for im proving query evaluation over possibly nearly-sorted relations. First we formally define what it means for a relation to be nearly-sorted, and show how operations over such relations, such as natural join, set operations and sorting, can be executed significantly more efficiently using an algorithm that we provide. If a relation is nearly-sorted enough, then it can be sorted using two sequential reads of the relation, and writing no intermediate data to disk. We then construct efficient probabilistic tests for approximating the degree of the near-sortedness of a relation without having to read an entire file. The role of our algorithms in a database manage- ment system setting is illustrated as soon as the theoretical foundation is laid out. Finally, we outline factors that relate to practical implementations of our algorithms. We show how our test can be enhanced to provide an approximation rather than just a yes-no answer, and discuss its implementability in real-life scenarios where some sparseness may be present in the database files (e.g. if they were created using a B*-tree ap proach). We also show how our sort can benefit distributed systems and systems that use a solid-state drive, which may very well become prevalent in the near future.
机译:当通过相关属性进行排序时,最佳地执行许多关系操作(例如,自然连接操作中的公共属性)。但是,通常不存储各自的关系,因为保持这种方式昂贵(并且只要有多个相关的排序密钥),它就是昂贵的。尽管如此,许多次关系转向几乎排序,大多数元组都在靠近他们的位置。这种状态可能是由“剩余的分类”产生的,其中最初排列的关系被更新,或者在评估到复杂查询时被组合成中期结果。它也可以是属性值之间的弱相关性。目前,几乎排序的关系与未分类的关系相同,并且当对它们进行评估时,使用泛型算法。然而,通过利用这种近排序的算法可以更有效地计算许多操作。但是,为了始终如一地受益于使用这种算法,系统还应避免使用错误的关系算法,这是不进行排序的关系。因此,需要有效的测试,即非常快速地近似算法,用于建立给定关系是否足够分类。在本文中,我们提供了对可能几乎排序关系的信息查询评估的理论基础。首先,我们正式定义了与几乎分类的关系意味着什么,并展示了这种关系的操作如何,例如自然连接,设置操作和排序,可以使用我们提供的算法来显着更有效地执行。如果关系几乎排序,则可以使用两个顺序读取的关系进行排序,并将中间数据写入磁盘。然后,我们构建有效的概率测试,用于近似于关系的近分类程度而无需读取整个文件。一旦布置理论基础,我们的算法在数据库管理系统设置中的作用是披露的。最后,我们概述了与我们算法的实际实施相关的因素。我们展示了我们的测试如何增强,以提供近似,而不是仅仅是yes-no答案,并讨论其在数据库文件中存在一些稀疏性的现实场景中的可实现性(例如,如果它们是使用B *创建的。 -tree ap proach)。我们还展示了我们的排序如何有益于使用固态驱动器的分布式系统和系统,这可能在不久的将来变得非常普遍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号