首页> 外文学位 >Efficient database support for OLAP queries (On-line analytical processing).
【24h】

Efficient database support for OLAP queries (On-line analytical processing).

机译:对OLAP查询的有效数据库支持(在线分析处理)。

获取原文
获取原文并翻译 | 示例

摘要

Computing multidimensional aggregates is a performance bottleneck for OLAP and multi-dimensional data analysis applications. This thesis addresses various problems that arise while speeding up multi-dimensional queries. The first problem we consider deals with computing the CUBE operator proposed by Gray et al. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations, and present a class of sorting-based algorithms that overlap the computation of different group-by operations using the least possible memory for each computation. Experiments show that the method dramatically outperforms the straightforward implementation of CUBE as a sequence of SQL group-by operations.; The second and third part of the thesis deal with caching for multi-dimensional queries. We developed a novel scheme where we cache small regions of the multi-dimensional space called “chunks”. Chunk-based caching allows fine granularity caching, and allows queries to partially reuse the results of previous queries with which they overlap. To facilitate the computation of chunks required by a query but missing from the cache, we propose a new organization for relational tables, which we call a “chunked file.” Our experiments show that for workloads that exhibit query locality, chunked caching combined with the chunked file organization performs better than traditional query level caching. Simple caching unfortunately misses the dramatic performance improvements obtainable when the answer to a query, while not immediately available in the cache, can be computed by aggregating other data in the cache. In order to use aggregation, one must solve two subproblems: (1) determining whether it is possible, and (2) determining the fastest path for this aggregation, since there can be many. We present a naive and a Virtual Count based strategy. The virtual count based methods determine if a query is computable from the cache and the fastest path for this computation almost instantaneously, with a small overhead of maintaining the summary state of the cache. Experiments show that aggregation in the cache leads to substantial performance improvement. The virtual count based methods further improve the performance compared to the naive approaches, in terms of cache lookup and aggregation times.
机译:计算多维聚合是OLAP和多维数据分析应用程序的性能瓶颈。本论文解决了加快多维查询时出现的各种问题。我们认为的第一个问题涉及计算Gray等人提出的 CUBE 运算符。我们展示了如何根据分组操作的层次结构来查看 CUBE 计算的结构,并展示了一类基于排序的算法,这些算法与使用分组的不同分组操作的计算重叠每次计算最少的内存。实验表明,该方法明显优于直接执行 CUBE 的SQL分组操作序列。本文的第二部分和第三部分处理多维查询的缓存。我们开发了一种新颖的方案,可以在多维空间中缓存称为“ chunks ”的小区域。基于块的缓存允许进行精细的粒度缓存,并允许查询部分重用与之重叠的先前查询的结果。为了简化查询所需要但缓存中缺少的块的计算,我们为关系表提出了一个新的组织,我们将其称为“块文件”。我们的实验表明,对于表现查询局部性的工作负载,分块缓存与分块文件组织相结合的性能要优于传统的查询级别缓存。不幸的是,当可以通过汇总缓存中的其他数据来计算查询的答案(虽然在缓存中不是立即可用)时,简单缓存无法获得显着的性能改进。为了使用聚合,一个人必须解决两个子问题:(1)确定是否可行,(2)确定这种聚合的最快路径,因为可能有很多。我们提出一种幼稚的策略和基于虚拟计数的策略。基于虚拟计数的方法几乎可以立即确定是否可以从高速缓存中查询查询,并且可以立即计算出此计算的最快路径,而维护高速缓存的摘要状态的开销很小。实验表明,缓存中的聚合可显着提高性能。与朴素的方法相比,基于虚拟计数的方法在缓存查找和聚合时间方面进一步提高了性能。

著录项

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2000
  • 页码 145 p.
  • 总页数 145
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:47:29

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号