首页> 外文期刊>Future generation computer systems >Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation
【24h】

Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation

机译:使用集群中间件实现数据多维数据集构建:算法,实现经验和性能评估

获取原文
获取原文并翻译 | 示例
       

摘要

With increases in the amount of data available for analysis in commercial settings, on line analytical processing (OLAP) and decision support have become important applications for high performance computing. Implementing such applications on clusters requires a lot of expertise and effort, particularly because of the sizes of input and output datasets. In this paper, we describe our experiences in developing one such application using a cluster middleware, called ADR. We focus on the problem of data cube construction, which commonly arises in multi-dimensional OLAP. We show how ADR, originally developed for scientific data intensive applications, can be used for carrying out an efficient and scalable data cube construction implementation. A particular issue with the use of ADR is tiling of output datasets. We present new algorithms that combine interprocessor communication and tiling within each processor. These algorithms preserve the important properties that are desirable from any parallel data cube construction algorithm. We have carried out a detailed evaluation of our implementation. The main results from our experiments are as follows: (1) high speedups are achieved on both dense and sparse datasets, even though we have used simple algorithms that sequentialize a part of the computation; (2) the execution time depends only upon the amount of computation, and does not increase in a super-linear fashion as the dataset size or the number of tiles increases; and (3) as the datasets become more sparse, sequential performance degrades, but the parallel speedups are still quite good. As part of our on-going work in this area, we are also looking at handling a larger number of dimensions and multi-dimensional partitionings. We describe our preliminary theoretical and experimental work in this direction.
机译:随着商业环境中可用于分析的数据量的增加,在线分析处理(OLAP)和决策支持已成为高性能计算的重要应用程序。在群集上实现此类应用程序需要大量的专业知识和工作量,尤其是由于输入和输出数据集的大小。在本文中,我们描述了使用称为ADR的集群中间件开发此类应用程序的经验。我们专注于多维多维数据集中常见的数据多维数据集构造问题。我们将展示最初为科学数据密集型应用程序开发的ADR如何可用于执行有效且可扩展的数据多维数据集构造实现。使用ADR的一个特殊问题是输出数据集的切片。我们提出了结合了处理器间通信和每个处理器内的平铺的新算法。这些算法保留了任何并行数据多维数据集构造算法所需的重要属性。我们对执行情况进行了详细的评估。我们的实验的主要结果如下:(1)即使我们使用简单的算法来顺序化一部分计算,但在密集和稀疏数据集上都实现了高加速; (2)执行时间仅取决于计算量,并且不会随着数据集大小或切片数量的增加而以超线性方式增加; (3)随着数据集变得越来越稀疏,顺序性能会下降,但是并行加速仍然相当不错。作为我们在这一领域正在进行的工作的一部分,我们还希望处理更多的维度和多维分区。我们描述了这个方向上的初步理论和实验工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号