A Comparison of Join Algorithms for Log Processing in MapReduce

机译：MapReduce的Log Processing Join算法比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is filtered, aggregated, or mined for patterns. As part of this analysis, the log often needs to be joined with reference data such as information about users. Although there have been many studies examining join algorithms in parallel and distributed DBMSs, the MapReduce framework is cumbersome for joins. MapReduce programmers often use simple but inefficient algorithms to perform joins. In this paper, we describe crucial implementation details of a number of well-known join strategies in MapReduce, and present a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster. Our results provide insights that are unique to the MapReduce platform and offer guidance on when to use a particular join algorithm on this platform.

机译：MapReduce框架越来越多地用于分析大量数据。使用MapReduce完成的一个重要类型的数据分析是日志处理，其中删除了单击流或事件日志，汇总或开采模式。作为此分析的一部分，日志通常需要与参考数据一起加入，例如有关用户的信息。虽然有很多研究在并行和分布式DBMS中检查了加入算法，但MapReduce框架是对加入的麻烦。 MapReduce程序员通常使用简单但效率低下的算法来执行连接。在本文中，我们描述了MapReduce中许多众所周知的连接策略的重要实现细节，并在100节点Hadoop集群上呈现这些连接技术的全面实验比较。我们的结果提供了MapReduce平台独特的见解，并在此平台上使用特定的连接算法提供指导。

著录项

来源
《ACM SIGMOD international conference on management of data》|2010年||共12页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
MapReduce; hadoop; join processing; analytics;

机译：mapreduce;Hadoop;加入处理;分析;

相似文献

外文文献
中文文献
专利

1. Load balancing in join algorithms for skewed data in MapReduce systems [J] . Gavagsaz Elaheh, Rezaee Ali, Javadi Hamid Haj Seyyed Journal of supercomputing . 2019,第1期

机译：MapReduce系统中偏斜数据的联接算法中的负载平衡
2. Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce [J] . Miyoung JANG, Jae-Woo CHANG IEICE transactions on information and systems . 2018,第4期

机译：MapReduce上多维数据分析的基于网格的联合查询并行算法
3. Handling data skew in join algorithms using MapReduce [J] . Myung Jaeseok, Shim Junho, Yeon Jongheum, Expert Systems with Application . 2016,第Juna期

机译：使用MapReduce处理联接算法中的数据偏斜
4. A Comparison of Join Algorithms for Log Processing in MapReduce [C] . Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, ACM SIGMOD international conference on management of data;SIGMOD 2010 . 2010

机译：MapReduce中用于日志处理的联接算法的比较
5. Efficient structural join processing algorithms. [D] . Liu, Kaiyang. 2005

机译：高效的结构连接处理算法。
6. Comparison of Different Post-Processing Algorithms for Dynamic Susceptibility Contrast Perfusion Imaging of Cerebral Gliomas [O] . Kohsuke Kudo, Ikuko Uwano, Toshinori Hirai, 2017

机译：脑胶质瘤动态敏感性对比灌注成像不同后处理算法的比较
7. Survey on Join Algorithms and Task Scheduling Algorithms for Hadoop MapReduce Platform [O] . 2017

机译：Hadoop MapReduce平台加入算法和任务调度算法调查

A Comparison of Join Algorithms for Log Processing in MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅