首页> 外文会议>SIGMOD/PODS 2007 >Lazy, Adaptive RID-List Intersection, and Its Application to Index Anding
【24h】

Lazy, Adaptive RID-List Intersection, and Its Application to Index Anding

机译:懒惰,自适应RID-List交叉点及其在索引Anding中的应用

获取原文

摘要

RID-List (row id list) intersection is a common strategy in query processing, used in star joins, column stores, and even search engines. To apply a conjunction of predicates on a table, a query processor does index lookups to form sorted RID-lists (or bitmap) of the rows matching each predicate, then intersects the RID-lists via an AND-tree, and finally fetches the corresponding rows to apply any residual predicates and aggregates. This process can be expensive when the RID-lists are large. Furthermore, the performance is sensitive to the order in which RIDlists are intersected together, and to treating the right predicates as residuals. If the optimizer chooses a wrong order or a wrong residual, due to a poor cardinality estimate, the resulting plan can run orders of magnitude slower than expected. We present a new algorithm for RID-list intersection that is both more efficient and more robust than this standard algorithm. First, we avoid forming the RID-lists up front, and instead form this lazily as part of the intersection. This reduces the associated IO and sort cost significantly, especially when the data distribution is skewed. It also ameliorates the problem of wrong residual table selection. Second, we do not intersect the RID-lists via an AND-tree, because this is vulnerable to cardinality mis-estimations. Instead, we use an adaptive set intersection algorithm that performs well even when the cardinality estimates are wrong. We present detailed experiments of this algorithm on data with varying distributions to validate its efficiency and predictability.
机译:RID-List(行ID列表)交集是查询处理中的常用策略,用于星型联接,列存储,甚至搜索引擎。为了在表上应用谓词的合取,查询处理器进行索引查找以形成与每个谓词匹配的行的排序RID列表(或位图),然后通过AND树与RID列表相交,最后获取对应的行以应用任何残留谓词和聚合。当RID列表很大时,此过程可能会很昂贵。此外,性能对于将RIDlist交叉在一起的顺序以及将正确的谓词视为残差都非常敏感。如果由于基数估计不正确,优化器选择了错误的顺序或错误的残差,则生成的计划的运行速度可能会比预期的慢几个数量级。我们提出了一种新的RID列表相交算法,它比该标准算法更有效,更健壮。首先,我们避免预先形成RID列表,而将其作为交集的一部分而懒惰地形成。这样可以显着降低相关的IO和排序成本,尤其是在数据分配不正确时。它还改善了错误的剩余表选择问题。其次,我们不通过AND树相交RID列表,因为它容易受到基数错误估计的影响。取而代之的是,我们使用自适应集合相交算法,即使基数估计错误,该算法也能很好地执行。我们目前对具有不同分布的数据进行该算法的详细实验,以验证其效率和可预测性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号