首页> 外文会议>ACM SIGMOD International Conference on Management of Data >Bottom-Up Computation of Sparse and Iceberg CUBEs
【24h】

Bottom-Up Computation of Sparse and Iceberg CUBEs

机译:自下而上的稀疏和冰山多维数据集计算

获取原文

摘要

We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) >= X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for Iceberg-CUBE computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger group-bys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBEs. We present a through performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBEs, and to dramatically improve Iceberg-CUBE computation.
机译:我们将冰山立方体问题介绍为Datacube(立方体)问题的重新制作。 iceBerg-Cube问题是仅计算具有超过一些最小支持阈值的聚合值(例如,计数)的分区。可以使用(1)使用(1)来回答群组的子宫序列,例如具有计数(*)> = x,其中x大于阈值,(2)用于挖掘多维关联规则,以及(3)补充现有策略,用于识别立方体的有趣子集以进行预计。我们为冰山 - 多维数据集计算提供了一种新的算法(BUC)。 Buc建立自下卧铺;即,它通过在单个属性上从组开始,然后在一对属性上开始,构建多维数据集,然后在一对属性上,然后在三个属性上进行组。这与前面提出的所有技术相反,用于计算立方体,并且具有重要的实际优势:Buc避免计算不符合最小支持的较大的群体。 BUC中的修剪类似于APRiori算法的修剪,以便关联规则除,除了BUC交易某种修剪,以便参考和降低内存要求。 BUC在计算稀疏,完整的立方体时使用相同的修剪策略。我们通过各种工作负载进行了绩效评估。我们的评估表明(与早期假设相比)最小化聚合或排序的数量不是稀疏立方体问题的最重要方面。 Buc中的修剪与有效的排序方法组合,使得Buc能够优于稀疏多维数据集的所有先前算法,即使计算整个立方体,也可以大大提高冰山 - 立方体计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号