Efficient query processing framework for big data warehouse: an almost join-free approach

Huiju WANG; Xiongpai QIN; Xuan ZHOU; Furong LI; Zuoyan QIN; Qing ZHU; Shan WANG

首页> 外文期刊>Frontiers of computer science in China >Efficient query processing framework for big data warehouse: an almost join-free approach

【24h】

Efficient query processing framework for big data warehouse: an almost join-free approach

机译：大数据仓库的高效查询处理框架：一种几乎免连接的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapidly increasing scale of data warehouses is challenging today's data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema - it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users' demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.

机译：数据仓库规模的快速增长正在挑战当今的数据分析技术。传统的数据分析平台使用星型模式处理数据仓库查询-将数据归一化为事实表和多个维度表，并且在查询处理期间，它会根据用户的需求选择性地联接这些表。这种模型是节省空间的。但是，将其应用于大数据时面临两个问题。首先，join是一项昂贵的操作，它禁止并行数据库或基于MapReduce的系统同时实现效率和可伸缩性。其次，联接操作必须重复执行，而许多联接结果实际上可以由不同的查询重用。在本文中，我们提出了一种新的数据仓库查询处理框架。它将联接操作部分推入预处理阶段，部分推入后处理阶段，以便可以将数据仓库查询转换为事实表上的大规模并行过滤器聚合操作。与常规查询处理模型相比，尽管联接中涉及大量表，但我们的方法高效，可扩展且稳定。特别适用于大型并行数据仓库。我们对Hadoop的经验评估表明，我们的框架具有线性可伸缩性，并且在性能上优于某些现有方法。

著录项

来源
《Frontiers of computer science in China》 |2015年第2期|224-236|共13页
作者
Huiju WANG; Xiongpai QIN; Xuan ZHOU; Furong LI; Zuoyan QIN; Qing ZHU; Shan WANG;
展开▼
作者单位

DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China,School of Computing, National University of Singapore, Singapore 117417, Singapore;

DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

DEKE Lab (Renmin University of China), Beijing 100872, China;

DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

DEKE Lab (Renmin University of China), Beijing 100872, China;

DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
data warehouse; large scale; TAMP; join-free; multi-version schema;

机译：数据仓库;大规模TAMP;免费参加;多版本架构;

相似文献

外文文献
中文文献
专利

1. Parallel Star Join+DataIndexes: efficient query processing in data warehouses and OLAP [J] . Datta A., VanderMeer D., Ramamritham K. IEEE Transactions on Knowledge and Data Engineering . 2002,第6期

机译：并行Star Join + DataIndexes：数据仓库和OLAP中的高效查询处理
2. Effectively and Efficiently Designing and Querying Parallel Relational Data Warehouses on Heterogeneous Database Clusters: The F&A Approach [J] . Ladjel Bellatreche, Alfredo Cuzzocrea, Soumia Benkrid Journal of database management . 2012,第4期

机译：在异构数据库集群上有效，高效地设计和查询并行关系数据仓库：F＆A方法
3. Efficient OLAP query processing in distributed data warehouses [J] . Michael O. Akinde, Michael H. Boehlen, Theodore Johnson, Information Systems . 2003,第1a2期

机译：分布式数据仓库中的高效OLAP查询处理
4. An efficient query processing with approval of data reliability using RBF neural networks with web enabled data warehouse [C] . Soundararajan K., Sureshkumar S., Selvamani P. International Conference on Computing Communication and Networking Technologies . 2013

机译：使用带Web的数据仓库的RBF神经网络进行有效的查询处理并批准数据可靠性
5. Approximate Query Processing in a Data Warehouse Using Random Sampling [D] . ?Nguyen, Trong Duc 2020

机译：使用随机抽样的数据仓库中的近似查询处理
6. Clinical Data Warehouse Query and Learning Tool Using a Human-Centered Participatory Design Process [O] . Sarah MULLIN, Jane ZHAO, Shyamashree SINHA, -1

机译：使用以人为本的参与式设计过程的临床数据仓库查询和学习工具
7. Parallel Star Join + Data Indexes: efficient query processing in data warehouses and OLAP [O] . DATTA ANINDYA, VANDERMEER DEBRA, RAMAMRITHAM KRITHI 2002

机译：并行星形联接+数据索引：数据仓库和OLAP中的高效查询处理
8. Allocation of Database Files Across Parallel Stores for Efficient Processing of Partial-Match Queries [R] . Bestul, T., Jajodia, S. 1987

机译：跨并行存储分配数据库文件以有效处理部分匹配查询

Efficient query processing framework for big data warehouse: an almost join-free approach

摘要

著录项

相似文献

相关主题

期刊订阅