通过对分布式列存储机制下多维数据仓库模型的研究,考虑到多维数据仓库模型上的关联和聚集操作常常会引入大量的数据迁移,提出一种有效的列存储机制下多维数据仓库模型的优化方法即结合层次编码技术.采用维表层次全局域编码和维表层次局部域编码相结合的方式对传统星型模型维表中的层次信息进行二进制编码整合,将维表的层次信息压缩进事实表形成无连接星型模型,并针对新模型下的数据特征提出一种复合压缩策略,以期减少分布式列存储机制下的OLAP操作引入的数据迁移并降低数据存储空间,提升系统的查询性能.实验结果表明,该优化方法是可行且有效的.%Based on the research of multi dimension data warehouse model on the distributed column storage,an effective distributed column storage optimization method with hierarchical coding techniques is proposed,considering that the association and aggregation operation of multi dimension data warehouse model often bring a lot of data migration.The optimization method uses local dimension hierarchical encoding and global dimension hierarchical encoding to encode the level information of the dimension table,and then compresses dimension hierarchies' information into fact table to form a join-free star schema.Then,a composite compression strategy is put forward for the data feature of the new model to reduce the data migration introduced by OLAP operation and the data storage space under the distributed column storage mechanism,improving the query performance of the system.The experimental results show that this optimization method is feasible and effective.
展开▼