声明
Abstract
摘要
Table of Contents
1 INTRODUCTION
1.1 Online Analytical:From Centralized System to Distributed System
1.1.1 Motivations
1.1.2 Elements of explored solutions
1.2 Issues
1.2.1 Deploy multidimensional data over a cluster
1.2.2 Query a warehouse based on an HBase cluster
1.3 Contributions
1.4 Structure of the thesis
2 STATE OF ART
2.1 Data Warehouse and OLAP
2.1.1 Foundations
2.1.2 Multidimensional model
2.1.3 Functional architecture of an OLAP system
2.1.4 Storage models
2.2 Hadoop Ecosystem
2.2.1 Hadoop Framework
2.3.2 MapReduce
2.3.3 HDFS:The Hadoop Distributed File System
2.3.4 HBASE
2.3 Data warehouse in distributed environment
2.3.1 Fragmentation of Warehouse
2.3.2 Warehouse on distributed database
2.4 Conclusion
3 Multidimensional Data on Distributed Storage
3.1 Use Cases
3.2 Conceptual model for multidimensional data
3.2.1 Schema and Instance of Dimension
3.2.2 Facts and Aggregates
3.2.3 Local Instances of Dimension
3.3.Identification of multidimensional data
3.3.1 Definition and identification of multidimensional chunks
3.3.2 Construction of chunks blocks
3.4 Multidimensional data indexing
3.4.1 Indexes on different aggregation levels
3.4.2 Indexes on chunks block
3.4.3 CCB Index Operations
3.5 Conclusion
4 REACTIVE SCHEDULING POLICY
4.1 Presentation of query processing phases
4.2 Rewriting the client request
4.3 Location useful data for the query
4.4.Queries Scheduling
4.5 Execution plan and optimization of execution
4.6 Queries execution and tasks scheduling
4.6.1 Our Scheduling Policy
4.6.2 Monitoring and updating the status of the execution
4.6.3 Assembly of the result
4.6.4 Scheduling Implementation
4.7 Conclusion
5 PROTOTYPE AND EXPERIMENTATION
5.1 Prototype Architecture
5.1.1.Our data model based on HBase
5.1.2 Presentation of the scheduling engine services for distributed storage
5.2 Prototype implementation
5.2.1 Hadoop/HBase deployment
5.2.2 OLAP Client Interface
5.2.2 Experiments Infrastructure
5.3 Experiments
5.3.1 Test Scenario
5.3.2 Stress Scenario
5.3.3 Results
5.4 Conclusion
6 CONCLUSIONS AND PERSPECTIVES
6.1 Evaluation and contributions
6.1.1 Identification and indexing of data multidimensional
6.1.2 Implementation and Query Optimization
6.1.3 Prototype of services
6.2 Limitation and perspectives
6.2.1 Management and maintenance of distributed data warehouse
6.2.2 Maintenance and adaption of CCB Index structures according to the change of distributed warehouse
6.2.3 Evolution and optimization of query processing method
6.2.4 Design and integration of methods by services architecture
References
PUBLICATION
Acknowledgments
Dedication