Bottom-Up Computation of Sparse and Iceberg CUBEs

机译：自下而上的稀疏和冰山多维数据集计算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) >= X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for Iceberg-CUBE computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger group-bys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBEs. We present a through performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBEs, and to dramatically improve Iceberg-CUBE computation.

机译：我们将冰山立方体问题介绍为Datacube（立方体）问题的重新制作。 iceBerg-Cube问题是仅计算具有超过一些最小支持阈值的聚合值（例如，计数）的分区。可以使用（1）使用（1）来回答群组的子宫序列，例如具有计数（*）> = x，其中x大于阈值，（2）用于挖掘多维关联规则，以及（3）补充现有策略，用于识别立方体的有趣子集以进行预计。我们为冰山 - 多维数据集计算提供了一种新的算法（BUC）。 Buc建立自下卧铺;即，它通过在单个属性上从组开始，然后在一对属性上开始，构建多维数据集，然后在一对属性上，然后在三个属性上进行组。这与前面提出的所有技术相反，用于计算立方体，并且具有重要的实际优势：Buc避免计算不符合最小支持的较大的群体。 BUC中的修剪类似于APRiori算法的修剪，以便关联规则除，除了BUC交易某种修剪，以便参考和降低内存要求。 BUC在计算稀疏，完整的立方体时使用相同的修剪策略。我们通过各种工作负载进行了绩效评估。我们的评估表明（与早期假设相比）最小化聚合或排序的数量不是稀疏立方体问题的最重要方面。 Buc中的修剪与有效的排序方法组合，使得Buc能够优于稀疏多维数据集的所有先前算法，即使计算整个立方体，也可以大大提高冰山 - 立方体计算。

著录项

来源
《ACM SIGMOD International Conference on Management of Data》|1999年||共12页
会议地点
作者
Kevin Beyer; Raghu Ramakrishnan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-532;
关键词

相似文献

外文文献
中文文献
专利

1. Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach [J] . Dong Xin, Jiawei Han, Xiaolei Li, IEEE Transactions on Knowledge and Data Engineering . 2007,第期

机译：通过自上而下和自下而上的集成计算冰山多维数据集：StarCubing方法
2. Efficient Computation of Iceberg Cubes by Bounding Aggregate Functions [J] . Zhang Xiuzhen, Chou Pauline Lienhua, Dong Guozhu IEEE Transactions on Knowledge and Data Engineering . 2007,第7期

机译：有界聚合函数对冰山立方体的高效计算
3. Fast Computation of Sparse Data Cubes with Constraints [J] . Feng Yu-cai, Chen Chang-qing, Feng Jian-lin, Wuhan University Journal of Natural Sciences . 2004,第2期

机译：带约束的稀疏数据立方体的快速计算
4. Bottom-Up Computation of Sparse and Iceberg CUBEs [C] . Kevin Beyer, Raghu Ramakrishnan ACM SIGMOD International Conference on Management of Data . 1999

机译：自下而上的稀疏和冰山多维数据集计算
5. The multi-tree cubing algorithm for computing iceberg cubes. [D] . Li, Xing. 2005

机译：用于计算冰山多维数据集的多树多维数据集算法。
6. Lateral gene transfer in eukaryotes: tip of the iceberg or of the ice cube? [O] . Etienne G. J. Danchin 2016

机译：真核生物的侧向基因转移：冰山一角还是冰块一角？
7. Bottom-up computation of sparse and Iceberg CUBE [O] . Kevin Beyer, Raghu Ramakrishnan 1999

机译：稀疏和冰山CUBE的自下而上计算
8. Introducing Enabling Computational Tools to the Climate Sciences: Multi-Resolution Climate Modeling with Adaptive Cubed-Sphere Grids. [R] . Jablonowski, C. 2015

机译：介绍为气候科学提供计算工具：采用自适应立方球网格的多分辨率气候模拟。

Bottom-Up Computation of Sparse and Iceberg CUBEs

摘要

著录项

相似文献

相关主题

期刊订阅