首页> 外文会议>ACM SIGMOD international conference on management of data >A Comparison of Join Algorithms for Log Processing in MapReduce
【24h】

A Comparison of Join Algorithms for Log Processing in MapReduce

机译:MapReduce的Log Processing Join算法比较

获取原文

摘要

The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is filtered, aggregated, or mined for patterns. As part of this analysis, the log often needs to be joined with reference data such as information about users. Although there have been many studies examining join algorithms in parallel and distributed DBMSs, the MapReduce framework is cumbersome for joins. MapReduce programmers often use simple but inefficient algorithms to perform joins. In this paper, we describe crucial implementation details of a number of well-known join strategies in MapReduce, and present a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster. Our results provide insights that are unique to the MapReduce platform and offer guidance on when to use a particular join algorithm on this platform.
机译:MapReduce框架越来越多地用于分析大量数据。使用MapReduce完成的一个重要类型的数据分析是日志处理,其中删除了单击流或事件日志,汇总或开采模式。作为此分析的一部分,日志通常需要与参考数据一起加入,例如有关用户的信息。虽然有很多研究在并行和分布式DBMS中检查了加入算法,但MapReduce框架是对加入的麻烦。 MapReduce程序员通常使用简单但效率低下的算法来执行连接。在本文中,我们描述了MapReduce中许多众所周知的连接策略的重要实现细节,并在100节点Hadoop集群上呈现这些连接技术的全面实验比较。我们的结果提供了MapReduce平台独特的见解,并在此平台上使用特定的连接算法提供指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号