首页> 外文会议>14th international conference on database theory 2011 >Detecting and Exploiting Near-Sortedness for Efficient Relational Query Evaluation
【24h】

Detecting and Exploiting Near-Sortedness for Efficient Relational Query Evaluation

机译:检测和利用近似度进行有效的关系查询评估

获取原文
获取原文并翻译 | 示例

摘要

Many relational operations are best performed when the relations are stored sorted over the relevant attributes (e.g. the common attributes in a natural join operation). However, generally relations are not stored sorted because it is expensive to maintain them this way (and impossible whenever there is more than one relevant sort key). Still, many times relations turn out to be nearly-sorted, where most tuples are close to their place in the order. This state can result from "leftover sortedness", where originally sorted relations were updated, or were combined into interim results when evalu ating a complex query. It can also result from weak correlations between attribute values. Currently, nearly-sorted relations are treated the same as unsorted relations, and when relational operations are evaluated for them, a generic algorithm is used. Yet, many operations can be computed more efficiently by an algorithm that exploits this near-ordering. However, to consistently benefit from using such algorithms the system should also refrain from using the wrong algorithm for relations which happen not to be sorted at all. Thus, an efficient test is required, i.e., a very fast approximation algorithm for establishing whether a given relation is sufficiently nearly-sorted. In this paper, we provide the theoretical foundations for im proving query evaluation over possibly nearly-sorted relations. First we formally define what it means for a relation to be nearly-sorted, and show how operations over such relations, such as natural join, set operations and sorting, can be executed significantly more efficiently using an algorithm that we provide. If a relation is nearly-sorted enough, then it can be sorted using two sequential reads of the relation, and writing no intermediate data to disk. We then construct efficient probabilistic tests for approximating the degree of the near-sortedness of a relation without having to read an entire file. The role of our algorithms in a database manage- ment system setting is illustrated as soon as the theoretical foundation is laid out. Finally, we outline factors that relate to practical implementations of our algorithms. We show how our test can be enhanced to provide an approximation rather than just a yes-no answer, and discuss its implementability in real-life scenarios where some sparseness may be present in the database files (e.g. if they were created using a B*-tree ap proach). We also show how our sort can benefit distributed systems and systems that use a solid-state drive, which may very well become prevalent in the near future.
机译:当按相关属性(例如,自然联接操作中的公共属性)排序存储关系时,许多关系操作最好执行。但是,通常不对关系进行存储排序,因为以这种方式维护它们很昂贵(并且当存在多个相关的排序键时,这是不可能的)。尽管如此,很多时候关系却几乎是零散的,其中大多数元组都在顺序中接近它们的位置。此状态可能是由于“剩余排序”导致的,其中原始排序的关系已更新,或者在评估复杂查询时被合并为临时结果。这也可能是由于属性值之间的弱相关性导致的。当前,将几乎排序的关系与未排序的关系视为相同,并且当对其进行关系运算评估时,将使用通用算法。然而,利用这种近序算法可以更有效地计算许多运算。但是,为了始终从使用此类算法中受益,系统还应避免对碰巧根本没有排序的关系使用错误的算法。因此,需要有效的测试,即,非常快速的近似算法,用于确定给定的关系是否足够近似地排序。在本文中,我们为改进可能近似排序的关系的查询评估提供了理论基础。首先,我们正式定义对关系进行近乎排序的含义,并说明如何使用我们提供的算法更有效地执行针对这种关系的操作(例如自然联接,设置操作和排序)。如果一个关系几乎被排序,则可以使用该关系的两次顺序读取来排序,并且不向磁盘写入任何中间数据。然后,我们构造有效的概率测试,以近似关系的近似度,而不必读取整个文件。一旦奠定了理论基础,就说明了我们的算法在数据库管理系统设置中的作用。最后,我们概述了与我们算法的实际实现相关的因素。我们展示了如何增强测试以提供近似值,而不只是一个是与否的答案,并讨论在数据库文件中可能存在一些稀疏性的现实情况下(例如,如果使用B *创建它们的情况下)其可实施性。 -tree ap方法)。我们还展示了我们的解决方案如何使分布式系统和使用固态驱动器的系统受益,这在不久的将来可能会变得越来越普遍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号