首页> 外文会议> >Design and implementation of a scalable parallel system for multidimensional analysis and OLAP
【24h】

Design and implementation of a scalable parallel system for multidimensional analysis and OLAP

机译:用于多维分析和OLAP的可伸缩并行系统的设计和实现

获取原文

摘要

Multidimensional Analysis and On-Line Analytical Processing (OLAP) uses summary information that requires aggregate operations along one or more dimensions of numerical data values. Query processing for these applications require different views of data for decision support. The Data Cube operator provides multi-dimensional aggregates, used to calculate and store summary information on a number of dimensions. The multi-dimensionality of the underlying problem can be represented both in relational and multi-dimensional databases, the latter being a better fit when query performance is the criteria for judgment. Relational databases are scalable in size and efforts are on to make their performance acceptable. On the other hand multi-dimensional databases perform well for such queries, although they are nor very scalable. Parallel computing is necessary to address the scalability and performance issues for these data sets. In this paper we present a parallel and scalable infrastructure for OLAP and multidimensional analysis. We use chunking to store data either as a dense block using multidimensional arrays (md-arrays) or a sparse set using a Bit encoded sparse structure (BESS). Chunks provide a multidimensional index structure for efficient dimension oriented data accesses much the same as md-arrays do. Operations within chunks and between chunks are a combination of relational and multi-dimensional operations depending on whether the chunk is sparse or dense. We present performance results for data sets with 3, 5 and 10 dimensions for our implementation on the IBM SP-2 which show good speedup and scalability.
机译:多维分析和在线分析处理(OLAP)使用摘要信息,这些信息要求沿数字数据值的一个或多个维度进行汇总操作。这些应用程序的查询处理需要不同的数据视图以提供决策支持。 Data Cube运算符提供多维汇总,用于计算和存储有关多个维度的摘要信息。潜在问题的多维性可以在关系数据库和多维数据库中表示,当查询性能是判断标准时,后者更适合。关系数据库的规模是可伸缩的,并且正在努力使它们的性能可以接受。另一方面,尽管多维数据库也不具有很好的可伸缩性,但它们对于此类查询的性能很好。并行计算对于解决这些数据集的可伸缩性和性能问题是必需的。在本文中,我们提出了用于OLAP和多维分析的并行且可扩展的基础结构。我们使用分块将数据存储为使用多维数组(md-arrays)的密集块或使用位编码的稀疏结构(BESS)的稀疏集。块提供了用于高效面向维度的数据访问的多维索引结构,这与md数组非常相似。块内以及块之间的操作取决于块是稀疏的还是密集的,是关系和多维操作的组合。对于IBM SP-2上的实现,我们提供了3、5和10维数据集的性能结果,这些结果显示了良好的加速和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号