【24h】

Fast Computation of Sparse Datacubes

机译:稀疏数据立方体的快速计算

获取原文
获取原文并翻译 | 示例

摘要

Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. Real-world data is frequently sparse, and hence efficiently computing datacubes over large sparse relations is important. We show that current techniques for computing datacubes over sparse relations do not scale well with the number of CUBE BY attributes, especially when the relation is much larger than main memory.rnWe propose a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrate the efficiency of our algorithm using synthetic, benchmark and real-world data sets. When the relation fits in memory, our technique performs multiple in-memory sorts, and does not incur any I/O beyond the input of the relation and the output of the datacube itself. When the relation does not fit in memory, a divide-and-conquer strategy divides the problem of computing the datacube into several simpler computations of sub-datacubes. Often, all but one of the sub-datacubes can be computed in memory and our in-memory solution applies. In that case, the total I/O overhead is linear in the number of CUBE BY attributes. We demonstrate with an implementation that the CPU cost of our algorithm is dominated by the I/O cost for sparse relations.
机译:Datacube查询以各种粒度计算数据库关系上的聚合,它们构成了一类重要的决策支持查询。现实世界中的数据经常是稀疏的,因此有效地计算大型稀疏关系上的数据立方体非常重要。我们证明了当前用于计算稀疏关系上的数据多维数据集的技术不能很好地扩展CUBE BY属性的数量,尤其是当该关系远大于主内存时。rn我们提出了一种新的算法来快速计算稀疏关系上的数据多维数据集,并且使用综合,基准和实际数据集来证明我们算法的效率。当关系适合内存时,我们的技术会执行多种内存排序,并且不会在关系的输入和数据多维数据集本身的输出之外产生任何I / O。当关系不适合内存时,分而治之策略将计算数据多维数据集的问题分为几个更简单的子数据多维数据集计算。通常,除了一个子数据多维数据集外,其他所有子数据多维数据集都可以在内存中计算,并且我们的内存中解决方案适用。在这种情况下,总I / O开销与CUBE BY属性的数量成线性关系。我们以一种实现方式证明,算法的CPU开销由稀疏关系的I / O开销决定。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号