首页> 外文会议>Sixth International Conference on Semantics Knowledge and Grid >Join Optimization in the MapReduce Environment for Column-wise Data Store

【24h】

Join Optimization in the MapReduce Environment for Column-wise Data Store

机译：MapReduce环境中针对列数据存储的联接优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The chain join processing which combines records from two or more tables sequentially has been well studied in the centralized databases. However, it has seldom been discussed in the cloud computing era, and remains imperative to be solved, especially where structured (or relational) data are stored in a column (attribute) wise fashion in distributed file systems (e.g., Google File System) over hundreds of or even thousands of commodities PCs. In this paper, we propose a novel method for chain join processing, which is one of the common primitives in the cloud era for column-wise stored data analysis. By effectively selecting the dedicated records (tuples) for the chain join based on the information exploited within bipartite join graph, communication cost for record transmission could be reduced dramatically. A bushy tree structure is deployed to regulate the chain join sequence, which further reduces the number of intermediate results generated and transmitted, and explores higher parallelism in join processing, while results in more efficient join processing. Our extensive performance study confirms the effectiveness and efficiency of our methods.

机译：已经在集中式数据库中很好地研究了将来自两个或多个表的记录顺序组合的链连接处理。但是，它很少在云计算时代讨论，并且仍然势在必行，特别是在结构化（或关系）数据以列（属性）方式存储在分布式文件系统（例如Google File System）中的情况下。数百甚至数千种商品PC。在本文中，我们提出了一种新的链连接处理方法，它是云时代用于列式存储数据分析的常见原语之一。通过基于二元联接图中利用的信息有效地选择用于链联接的专用记录（元组），可以显着降低记录传输的通信成本。部署了浓密的树状结构来调节链连接顺序，这进一步减少了生成和传输的中间结果的数量，并在连接处理中探索了更高的并行度，同时导致更高效的连接处理。我们广泛的性能研究证实了我们方法的有效性和效率。

著录项

来源
《Sixth International Conference on Semantics Knowledge and Grid 》|2010年|p.97-104|共8页
会议地点
作者
Zhou Minqi; Zhang Rong; Zeng Dadan; Qian Weining; Zhou Aoying;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计 ;
关键词

相似文献

外文文献
中文文献
专利

1. Introducing extreme data storage middleware of schema-free document stores using MapReduce [J] . Ma Kun, Yang Bo International journal of ad hoc and ubiquitous computing . 2015 ,第4期

机译：使用MapReduce引入无模式文档存储的极限数据存储中间件
2. DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data [J] . XuXu, JiaZhao, GaochaoXu, International Journal of Distributed Sensor Networks . 2014 ,第1期

机译：DSMC：一种使用MapReduce模型和大数据社区检测的新型Internet数据分布式存储-检索方法
3. Optimizing Hash Join with MapReduce on Multi-Core CPUs [J] . Tong YUAN, Zhijing LIU, Hui LIU IEICE transactions on information and systems . 2016 ,第5期

机译：在多核CPU上使用MapReduce优化哈希联接
4. Join Optimization in the MapReduce Environment for Column-wise Data Store [C] . Zhou Minqi, Zhang Rong, Zeng Dadan, International Conference on Semantics, Knowledge and Grid . 2010

机译：加入MapReduce环境中的优化，用于列 - 明智的数据存储
5. High performance integration of data parallel file systems and computing: Optimizing MapReduce. [D] . Guo, Zhenhua. 2012

机译：数据并行文件系统和计算的高性能集成：优化MapReduce。
6. MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data [O] . Jingjing Wang, Chen Lin 2015

机译：基于MapReduce的个性化本地敏感哈希用于大规模数据上的相似联接
7. Temporal Data Management and Incremental Data Recomputation with Wide-column Stores and MapReduce [O] . Hu Yong 2017

机译：宽列存储和MapReduce的时间数据管理和增量数据计算

Join Optimization in the MapReduce Environment for Column-wise Data Store

摘要

著录项

相似文献

相关主题

期刊订阅