首页> 外文期刊>Distributed and Parallel Databases >Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors
【24h】

Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

机译:无共享多处理器上的并行ROLAP数据多维数据集构造

获取原文
获取原文并翻译 | 示例

摘要

The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. In order to meet the need for improved performance created by growing data sizes, parallel solutions for generating the data cube are becoming increasingly important. This paper presents a parallel method for generating data cubes on a shared-nothing multiprocessor. Since no (expensive) shared disk is required, our method can be used on low cost Beowulf style clusters consisting of standard PCs with local disks connected via a data switch. Our approach uses a ROLAP representation of the data cube where views are stored as relational tables. This allows for tight integration with current relational database technology. We have implemented our parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, local vs. global schedule trees, data skew, cardinality of dimensions, data dimensionality, and balance tradeoffs. For an input data set of 2,000,000 rows (72 Megabytes), our parallel data cube generation method achieves close to optimal speedup; generating a full data cube of ≈227 million rows (5.6 Gigabytes) on a 16 processors cluster in under 6 minutes. For an input data set of 10,000,000 rows (360 Megabytes), our parallel method, running on a 16 processor PC cluster, created a data cube consisting of ≈846 million rows (21.7 Gigabytes) in under 47 minutes.
机译:数据多维数据集的预计算对于提高在线分析处理(OLAP)系统的响应时间至关重要,并且可以在加速大型数据仓库中的数据挖掘任务中发挥作用。为了满足因数据量增加而提高性能的需求,用于生成数据立方体的并行解决方案变得越来越重要。本文提出了一种在无共享多处理器上生成数据多维数据集的并行方法。由于不需要(昂贵的)共享磁盘,因此我们的方法可用于由标准PC组成的低成本Beowulf样式群集,这些群集具有通过数据交换机连接的本地磁盘。我们的方法使用数据多维数据集的ROLAP表示形式,其中视图存储为关系表。这允许与当前的关系数据库技术紧密集成。我们已经实现了并行的无共享数据多维数据集生成方法,并在PC群集上对其进行了评估,研究了相对加速,本地与全局调度树,数据偏斜,维数基数,数据维数和平衡权衡。对于2,000,000行(72兆字节)的输入数据集,我们的并行数据立方体生成方法可实现接近最佳的加速。在不到6分钟的时间内,即可在16个处理器集群上生成约2.27亿行(5.6 GB)的完整数据立方体。对于10,000,000行(360 MB)的输入数据集,我们的并行方法在16个处理器的PC群集上运行,在47分钟内创建了一个包含约8.46亿行(21.7千兆字节)的数据立方体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号