首页> 外文期刊>International journal of grid and high performance computing >Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses
【24h】

Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses

机译:优化云数据仓库中多联接查询处理的通信

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, the authors present storage structures, PK-map and Tuple-index-map, to improve the performance of query execution and inter-node communication in Cloud Data Warehouses. Cloud Data Warehouses require Read-Optimized databases because large amount of historical data are integrated on a regular basis to facilitate analytical applications for report generation, future analysis, and decision-making. This frequent data integration can grow the data size rapidly and hence there is a need to allocate resource dynamically on demand. As resource is scaled-out in the cloud environment, the number of nodes involved in the execution of a query increases. This in turn increases the number of inter-node communications. In queries, join operation between two different tables are most common. To perform the join operation of a query in the cloud environment, data need to be transferred among different nodes. This becomes critical when there is a huge amount of data (in Terabytes or Petabytes) stored across a large number of nodes. With the increase in number of nodes and amount of data, the size of the communication messages also increases, resulting in even increased bandwidth usage and performance degradation. In this paper, the authors show through extensive experiments using PlanetLab Cloud that their proposed storage structures PK-map and Tuple-index-map, and query execution algorithms improve the performance of join queries, decrease inter-node communication and workload in Cloud Data Warehouses.
机译:在本文中,作者提出了存储结构PK-map和Tuple-index-map,以提高云数据仓库中查询执行和节点间通信的性能。云数据仓库需要读取优化的数据库,因为定期会集成大量历史数据,以方便分析应用程序生成报告,进行未来分析和制定决策。这种频繁的数据集成可以迅速增加数据大小,因此需要按需动态分配资源。随着在云环境中横向扩展资源,执行查询所涉及的节点数量增加。这继而增加了节点间通信的数量。在查询中,两个不同表之间的联接操作是最常见的。为了在云环境中执行查询的联接操作,需要在不同节点之间传输数据。当在大量节点上存储大量数据(以TB或PB为单位)时,这变得至关重要。随着节点数量和数据量的增加,通信消息的大小也随之增加,甚至导致带宽使用量增加和性能下降。在本文中,作者通过使用PlanetLab Cloud进行的大量实验表明,他们提出的存储结构PK-map和Tuple-index-map,以及查询执行算法提高了联接查询的性能,减少了Cloud Data Warehouse中的节点间通信和工作量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号