...
首页> 外文期刊>Malaysian Journal of Computer Science >DMBVA - A Compression-Based Distributed Data Warehouse Management In Parallel Environment
【24h】

DMBVA - A Compression-Based Distributed Data Warehouse Management In Parallel Environment

机译:DMBVA-并行环境中基于压缩的分布式数据仓库管理

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Parallel and distributed data warehouse architectures have been evolved to support online queries on massive data in a short time. Unfortunately, the emergence of e-application has been creating extremely high volume of data that reaches to terabyte threshold. The conventional data warehouse management system is costlier in terms of storage space and processing speed and sometimes it is unable to handle such huge amount of data. As a result, there is a crucial need for the new algorithms and techniques to store and manipulate these data. In this paper, we have presented a compression-based distributed data warehouse architecture – ‘DMBVA’ for storage of warehouse data, and support online queries efficiently. We have achieved a factor of 25-30 compression compared to SQL server data warehouse. The main computational component of data warehouse is the generation and querying on the data cube. Our algorithm – ‘PCVDC’ generates data cube directly from the compressed form of data in parallel. The reduction in the size of data cube is a factor of 30-45 compared to existing methods. The response time has also been significantly improved. These improvements are achieved by eliminating the suffix and prefix redundancy, virtual nature of the data cube, direct addressability of compressed form of data and parallel computation. Experimental evaluation shows the improved performance over the existing systems.
机译:并行和分布式数据仓库体系结构已经得到发展,可以在短时间内支持对大量数据的在线查询。不幸的是,电子应用程序的出现一直在创建大量数据,达到TB阈值。传统的数据仓库管理系统在存储空间和处理速度方面较为昂贵,有时无法处理如此大量的数据。结果,迫切需要用于存储和操纵这些数据的新算法和技术。在本文中,我们介绍了一种基于压缩的分布式数据仓库体系结构-“ DMBVA”,用于存储仓库数据,并有效地支持在线查询。与SQL Server数据仓库相比,我们实现了25-30的压缩率。数据仓库的主要计算组件是对数据多维数据集的生成和查询。我们的算法“ PCVDC”直接从压缩数据的并行形式直接生成数据立方体。与现有方法相比,数据立方体的大小减少了30-45倍。响应时间也得到了明显改善。这些改进是通过消除后缀和前缀冗余,数据多维数据集的虚拟特性,数据压缩形式的直接寻址能力以及并行计算来实现的。实验评估表明,与现有系统相比,性能有所提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号