首页> 外文期刊>Journal of Parallel and Distributed Computing >GOM-Hadoop: A distributed framework for efficient analytics on ordered datasets
【24h】

GOM-Hadoop: A distributed framework for efficient analytics on ordered datasets

机译:GOM-Hadoop:用于对有序数据集进行高效分析的分布式框架

获取原文
获取原文并翻译 | 示例

摘要

One of the most common datasets exploited by many corporations to conduct business intelligence analysis is event log files. Oftentimes, the records in event log files are temporally ordered, and need to be grouped by certain key with the temporal ordering preserved to facilitate further analysis. One such example is to group temporally ordered events by user ID in order to analyze user behavior. This kind of analytical workload, here referred to as RElative Order-pReserving based Grouping (Re-Org), is quite common in big data analytics, where the MapReduce programming paradigm (and its open-source implementation, Hadoop) is widely adopted for massive parallel processing. However, using MapReduce/Hadoop for executing Re-Org tasks on ordered datasets is not efficient due to its internal sort-merge mechanism when shuffling data from mappers to reducers. In this paper, we propose a distributed framework that adopts an efficient group-order-merge mechanism to speed up the execution of Re-Org tasks. We demonstrate the advantage of our framework by formally modeling its execution process and by comparing its performance with Hadoop through extensive experiments on real-world datasets. The evaluation results show that our framework can achieve up to 6.3x speedup over Hadoop in executing Re-Org tasks.
机译:事件日志文件是许多公司用来进行商业智能分析的最常见数据集之一。通常,事件日志文件中的记录是按时间顺序排列的,需要按某些键进行分组,并保留时间顺序以便于进一步分析。一个这样的示例是通过用户ID对时间排序事件进行分组,以便分析用户行为。这种分析工作负载,在此称为基于相对订单保留的分组(Re-Org),在大数据分析中非常普遍,其中MapReduce编程范例(及其开源实现Hadoop)被广泛用于大规模并行处理。但是,使用MapReduce / Hadoop在有序数据集上执行Re-Org任务效率不高,这是因为在将数据从映射器转换到化简器时,其内部的排序合并机制。在本文中,我们提出了一个分布式框架,该框架采用有效的组顺序合并机制来加快Re-Org任务的执行速度。我们通过对执行过程进行正式建模,并通过对真实数据集进行广泛的实验,将其性能与Hadoop进行比较,来证明我们框架的优势。评估结果表明,在执行Re-Org任务时,我们的框架可以比Hadoop加快6.3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号