首页> 外文会议>Sixth International Conference on Semantics Knowledge and Grid >Join Optimization in the MapReduce Environment for Column-wise Data Store
【24h】

Join Optimization in the MapReduce Environment for Column-wise Data Store

机译:MapReduce环境中针对列数据存储的联接优化

获取原文

摘要

The chain join processing which combines records from two or more tables sequentially has been well studied in the centralized databases. However, it has seldom been discussed in the cloud computing era, and remains imperative to be solved, especially where structured (or relational) data are stored in a column (attribute) wise fashion in distributed file systems (e.g., Google File System) over hundreds of or even thousands of commodities PCs. In this paper, we propose a novel method for chain join processing, which is one of the common primitives in the cloud era for column-wise stored data analysis. By effectively selecting the dedicated records (tuples) for the chain join based on the information exploited within bipartite join graph, communication cost for record transmission could be reduced dramatically. A bushy tree structure is deployed to regulate the chain join sequence, which further reduces the number of intermediate results generated and transmitted, and explores higher parallelism in join processing, while results in more efficient join processing. Our extensive performance study confirms the effectiveness and efficiency of our methods.
机译:已经在集中式数据库中很好地研究了将来自两个或多个表的记录顺序组合的链连接处理。但是,它很少在云计算时代讨论,并且仍然势在必行,特别是在结构化(或关系)数据以列(属性)方式存储在分布式文件系统(例如Google File System)中的情况下。数百甚至数千种商品PC。在本文中,我们提出了一种新的链连接处理方法,它是云时代用于列式存储数据分析的常见原语之一。通过基于二元联接图中利用的信息有效地选择用于链联接的专用记录(元组),可以显着降低记录传输的通信成本。部署了浓密的树状结构来调节链连接顺序,这进一步减少了生成和传输的中间结果的数量,并在连接处理中探索了更高的并行度,同时导致更高效的连接处理。我们广泛的性能研究证实了我们方法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号