Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

YING CHEN; FRANK DEHNE; TODD EAVIS; ANDREW RAU-CHAPLIN

首页> 外文期刊>Distributed and Parallel Databases >Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

【24h】

Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

机译：无共享多处理器上的并行ROLAP数据多维数据集构造

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. In order to meet the need for improved performance created by growing data sizes, parallel solutions for generating the data cube are becoming increasingly important. This paper presents a parallel method for generating data cubes on a shared-nothing multiprocessor. Since no (expensive) shared disk is required, our method can be used on low cost Beowulf style clusters consisting of standard PCs with local disks connected via a data switch. Our approach uses a ROLAP representation of the data cube where views are stored as relational tables. This allows for tight integration with current relational database technology. We have implemented our parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, local vs. global schedule trees, data skew, cardinality of dimensions, data dimensionality, and balance tradeoffs. For an input data set of 2,000,000 rows (72 Megabytes), our parallel data cube generation method achieves close to optimal speedup; generating a full data cube of ≈227 million rows (5.6 Gigabytes) on a 16 processors cluster in under 6 minutes. For an input data set of 10,000,000 rows (360 Megabytes), our parallel method, running on a 16 processor PC cluster, created a data cube consisting of ≈846 million rows (21.7 Gigabytes) in under 47 minutes.

机译：数据多维数据集的预计算对于提高在线分析处理（OLAP）系统的响应时间至关重要，并且可以在加速大型数据仓库中的数据挖掘任务中发挥作用。为了满足因数据量增加而提高性能的需求，用于生成数据立方体的并行解决方案变得越来越重要。本文提出了一种在无共享多处理器上生成数据多维数据集的并行方法。由于不需要（昂贵的）共享磁盘，因此我们的方法可用于由标准PC组成的低成本Beowulf样式群集，这些群集具有通过数据交换机连接的本地磁盘。我们的方法使用数据多维数据集的ROLAP表示形式，其中视图存储为关系表。这允许与当前的关系数据库技术紧密集成。我们已经实现了并行的无共享数据多维数据集生成方法，并在PC群集上对其进行了评估，研究了相对加速，本地与全局调度树，数据偏斜，维数基数，数据维数和平衡权衡。对于2,000,000行（72兆字节）的输入数据集，我们的并行数据立方体生成方法可实现接近最佳的加速。在不到6分钟的时间内，即可在16个处理器集群上生成约2.27亿行（5.6 GB）的完整数据立方体。对于10,000,000行（360 MB）的输入数据集，我们的并行方法在16个处理器的PC群集上运行，在47分钟内创建了一个包含约8.46亿行（21.7千兆字节）的数据立方体。

著录项

来源
《Distributed and Parallel Databases》 |2004年第3期|p.219-236|共18页
作者
YING CHEN; FRANK DEHNE; TODD EAVIS; ANDREW RAU-CHAPLIN;
展开▼
作者单位

Faculty of Computer Science, Dalhousie University, Halifax, Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
data warehousing; OLAP; data cube; high performance computing;

机译：数据仓库;OLAP;数据立方体;高性能计算;

相似文献

外文文献
中文文献
专利

1. The cgmCUBE project: Optimizing parallel data cube generation for ROLAP [J] . Frank Dehne, Todd Eavis, Andrew Rau-Chaplin Distributed and Parallel Databases . 2006,第1期

机译：cgmCUBE项目：优化ROLAP的并行数据多维数据集生成
2. Parallel relational operations using clustered surrogate files on shared-nothing multiprocessors [J] . Chung SM. Information Sciences: An International Journal . 1998,第1a4期

机译：在无共享多处理器上使用集群代理文件进行并行关系操作
3. RCUBE: Parallel Multi-Dimensional ROLAP Indexing [J] . Dehne Frank, Eavis Todd, Rau-Chaplin Andrew International Journal of Data Warehousing and Mining . 2008,第3期

机译：RCUBE：并行多维ROLAP索引
4. Parallel ROLAP Data Cube Construction On Shared-Nothing Multiprocessors [C] . Ying Chen, Frank Dehne, Todd Eavis, International Parallel and Distributed Processing Symposium . 2003

机译：Shared-Nother Multi处理器上并行ROLAP数据CUBE构建
5. Data placement in shared-nothing parallel database systems [D] . Padmanabhan, Sriram 1992

机译：无共享并行数据库系统中的数据放置
6. Construction of Multi-dimensional Arterial Health Status Map based on Molecular and Clinical Measurements Fuzzy System and Data Cubes [O] . Lawrence W.C. Chan, Iris F.F. Benzie, Thomas Y.H. Lau, 2008

机译：基于分子和临床测量模糊系统和数据立方体的多维动脉健康状况图的构建
7. Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors [O] . Ying Chen, Frank Dehne 2004

机译：无共享多处理器上的并行ROLap数据立方体构造
8. High Performance Active Database Management on a Shared-Nothing Parallel Processor [R] . Hanson, E. N. 1998

机译：无共享并行处理器的高性能主动数据库管理

Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅