Processing Theta-Joins using MapReduce

机译：使用MapReduce处理Theta-Joins

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Joins are essential for many data analysis tasks, but are not supported directly by the MapReduce paradigm. While there has been progress on equi-joins, implementation of join algorithms in MapReduce in general is not sufficiently understood. We study the problem of how to map arbitrary join conditions to Map and Reduce functions, i.e., a parallel infrastructure that controls data flow based on key-equality only. Our proposed join model simplifies creation of and reasoning about joins in MapReduce. Using this model, we derive a surprisingly simple randomized algorithm, called 1-Bucket-Theta, for implementing arbitrary joins (theta-joins) in a single MapReduce job. This algorithm only requires minimal statistics (input cardinality) and we provide evidence that for a variety of join problems, it is either close to optimal or the best possible option. For some of the problems where 1-Bucket-Theta is not the best choice, we show how to achieve better performance by exploiting additional input statistics. All algorithms can be made 'memory-aware', and they do not require any modifications to the MapReduce environment. Experiments show the effectiveness of our approach.

机译：对于许多数据分析任务来说，联接是必不可少的，但MapReduce范例不直接支持联接。尽管在等联接方面已经取得了进展，但总体上对MapReduce中的联接算法的实现还没有足够的了解。我们研究如何将任意联接条件映射到Map和Reduce函数的问题，即仅基于键相等性控制数据流的并行基础结构。我们提出的联接模型简化了MapReduce中联接的创建和推理。使用此模型，我们得出了一个令人惊讶的简单随机算法，称为1-Bucket-Theta，用于在单个MapReduce作业中实现任意联接（theta-joins）。该算法仅需要最少的统计信息（输入基数），并且我们提供的证据表明，对于各种连接问题，它要么接近最佳，要么是最佳选择。对于1-Bucket-Theta不是最佳选择的一些问题，我们展示了如何通过利用附加的输入统计信息来实现更好的性能。所有算法都可以设为“内存感知”的，并且不需要对MapReduce环境进行任何修改。实验证明了我们方法的有效性。

著录项

来源
《International conference on management of data》|2011年|949-960|共12页
会议地点
作者
Alper Okcan; Mirek Riedewald;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
MapReduce; Theta Join Processing; Skew;

机译：MapReduce; Theta连接处理;偏斜;

相似文献

外文文献
中文文献
专利

1. An efficient theta-join query processing in distributed environment [J] . Liu Wenjie, Li Zhanhuai Journal of Parallel and Distributed Computing . 2018,第NOVa期

机译：分布式环境中高效的theta-join查询处理
2. GPU processing of theta-joins [J] . Christos Bellas, Anastasios Gounaris Concurrency and Computation . 2017,第18期

机译：theta-joins的GPU处理
3. Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling [J] . RONG CHEN, HAIBO CHEN ACM Transactions on Architecture and Code Optimization . 2013,第1期

机译：Tiled-MapReduce：在多核平铺上高效灵活的MapReduce处理
4. An Efficient Theta-Join Query Processing Algorithm on MapReduce Framework [C] . Chen Shih-Ying, Chang Tsui-Ping, Chang Zhi-Hong Computer, Consumer and Control (IS3C), 2012 International Symposium on . 2012

机译：基于MapReduce框架的高效Theta-Join查询处理算法
5. Processing Theta-Joins on Shared-Nothing Systems. [D] . Okcan, Alper. 2014

机译：在无共享系统上处理Theta联接。
6. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data [O] . Yuxin Chen, Yongsheng Chen, Chunmei Shi, -1

机译：SOAPnuke：MapReduce加速支持的软件用于集成质量控制和高通量测序数据的预处理
7. Efficient Multi-way Theta-Join Processing Using MapReduce [O] . Xiaofei Zhang, Lei Chen, Min Wang 2012

机译：使用MapReduce的高效多路Theta-Join处理

Processing Theta-Joins using MapReduce

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅