Parallel Star Join+DataIndexes: efficient query processing in data warehouses and OLAP

Datta A.; VanderMeer D.; Ramamritham K.

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Parallel Star Join+DataIndexes: efficient query processing in data warehouses and OLAP

【24h】

Parallel Star Join+DataIndexes: efficient query processing in data warehouses and OLAP

机译：并行Star Join + DataIndexes：数据仓库和OLAP中的高效查询处理

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

On-line analytical processing (OLAP) refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision-support purposes. Data warehouses tend to be extremely large, it is quite possible for a data warehouse to be hundreds of gigabytes to terabytes in size (Chauduri and Dayal, 1997). Queries tend to be complex and ad hoc, often requiring computationally expensive operations such as joins and aggregation. Given this, we are interested in developing strategies for improving query processing in data warehouses by exploring the applicability of parallel processing techniques. In particular, we exploit the natural partitionability of a star schema and render it even more efficient by applying DataIndexes-a storage structure that serves both as an index as well as data and lends itself naturally to vertical partitioning of the data. DataIndexes are derived from the various special purpose access mechanisms currently supported in commercial OLAP products. Specifically, we propose a declustering strategy which incorporates both task and data partitioning and present the Parallel Star Join (PSJ) Algorithm, which provides a means to perform a star join in parallel using efficient operations involving only rowsets and projection columns. We compare the performance of the PSJ Algorithm with two parallel query processing strategies. The first is a parallel join strategy utilizing the Bitmap Join Index (BJI), arguably the state-of-the-art OLAP join structure in use today. For the second strategy we choose a well-known parallel join algorithm, namely the pipelined hash algorithm. To assist in the performance comparison, we first develop a cost model of the disk access and transmission costs for all three approaches.

机译：在线分析处理（OLAP）是指允许用户有效地从数据仓库检索数据以支持决策的技术。数据仓库往往非常大，数据仓库的大小很有可能达到数百GB到TB（Chauduri和Dayal，1997）。查询往往是复杂且临时的，通常需要计算量大的操作，例如联接和聚合。鉴于此，我们有兴趣通过探索并行处理技术的适用性来开发改进数据仓库中查询处理的策略。特别是，我们利用星型模式的自然可分区性，并通过应用DataIndexes（使它既充当索引又充当数据并自然地适合于数据的垂直分区）的存储结构来使其更加高效。 DataIndex是从商业OLAP产品当前支持的各种特殊用途的访问机制派生的。具体来说，我们提出了一种将任务和数据分区结合在一起的分簇策略，并提出了并行星形联接（PSJ）算法，该算法提供了一种仅使用行集和投影列的高效操作即可并行执行星形联接的方法。我们将PSJ算法与两种并行查询处理策略的性能进行比较。第一种是利用位图连接索引（BJI）的并行连接策略，可以说是当今使用的最先进的OLAP连接结构。对于第二种策略，我们选择一种众所周知的并行联接算法，即流水线哈希算法。为了帮助进行性能比较，我们首先针对这三种方法开发了磁盘访问和传输成本的成本模型。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2002年第6期|p.1299-1316|共18页
作者
Datta A.; VanderMeer D.; Ramamritham K.;
展开▼
作者单位

Georgia Inst. of Technol., Atlanta, GA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
data warehouses; data mining; query processing; database indexing; parallel algorithms; software performance evaluation; Parallel Star Join; DataIndexes; query processing; data warehouses; OLAP; online analytical processing; decision-support; aggrega;

机译：数据仓库;数据挖掘;查询处理;数据库索引;并行算法;软件性能评估;平行星加入;数据索引;查询处理;数据仓库;OLAP;在线分析处理;决策支持;阿格雷加;

相似文献

外文文献
中文文献
专利

1. Efficient OLAP query processing in distributed data warehouses [J] . Michael O. Akinde, Michael H. Boehlen, Theodore Johnson, Information Systems . 2003,第1a2期

机译：分布式数据仓库中的高效OLAP查询处理
2. Parallel OLAP query processing in database clusters with data replication [J] . Alexandre A.B. Lima, Camille Furtado, Patrick Valduriez, Distributed and Parallel Databases . 2009,第1a2期

机译：具有数据复制的数据库集群中的并行OLAP查询处理
3. Finding an efficient rewriting of OLAP queries using materialized views in data warehouses [J] . Chang-Sup Park, Myoung Ho Kim, Yoon-Joon Lee Decision support systems . 2002,第4期

机译：使用数据仓库中的物化视图查找OLAP查询的有效重写
4. Efficient OLAP query processing in distributed data warehouses [C] . Akinde, M., Bohlen, . 2002

机译：分布式数据仓库中的高效OLAP查询处理
5. Efficient database support for OLAP queries (On-line analytical processing). [D] . Deshpande, Prasad Manikarao. 2000

机译：对OLAP查询的有效数据库支持（在线分析处理）。
6. Optimizing healthcare research data warehouse design through past COSTAR query analysis. [O] . S. N. Murphy, M. M. Morgan, G. O. Barnett, 1999

机译：通过过去的COSTAR查询分析来优化医疗研究数据仓库的设计。
7. Parallel Star Join + Data Indexes: efficient query processing in data warehouses and OLAP [O] . DATTA ANINDYA, VANDERMEER DEBRA, RAMAMRITHAM KRITHI 2002

机译：并行星形联接+数据索引：数据仓库和OLAP中的高效查询处理

Parallel Star Join+DataIndexes: efficient query processing in data warehouses and OLAP

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅