首页> 外文会议>International conference on management of data >RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures
【24h】

RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures

机译:RAFT在工作中:在任务和节点故障下加速MapReduce应用程序

获取原文

摘要

The MapReduce framework is typically deployed on very large computing clusters where task and node failures are no longer an exception but the rule. Thus, fault-tolerance is an important aspect for the efficient operation of MapReduce jobs. However, currently MapReduce implementations fully recompute failed tasks (sub-parts of a job) from the beginning. This can significantly decrease the runtime performance of MapReduce applications. We present an alternative system that implements RAFT ideas [10]. RAFT is a family of powerful and inexpensive Recovery Algorithms for Fast-Tracking MapReduce jobs under task and node failures. To recover from task failures, RAFT exploits the intermediate results persisted by MapReduce at several points in time. RAFT piggybacks checkpoints on the task progress computation. To recover from node failures, RAFT maintains a per-map task list of all input key-value pairs producing intermediate results and pushes intermediate results to reducers. In this demo, we demonstrate that RAFT recovers efficiently from both task and node failures. Further, the audience can compare RAFT with Hadoop via an easy-to-use web interface.
机译:MapReduce框架通常在非常大的计算群集上部署,其中任务和节点故障不再是例外但规则。因此,容错是MapReduce作业的有效操作的一个重要方面。但是,目前MapReduce实现从头开始完全重新编译失败的任务(作业的子部分)。这可以显着降低MapReduce应用程序的运行时性能。我们提出了一种实现筏子想法的替代系统[10]。 RAFT是一个强大而廉价的恢复算法,用于在任务和节点故障下快速跟踪MapReduce作业。要从任务失败中恢复,RAFT利用MapReduce在几个时间点持久的中间结果。 RAFT LIFENACKS关于任务进度计算的检查站。要从节点故障中恢复,RAFT将维护所有输入键值对的每个地图任务列表,从而产生中间结果,然后将中间结果推向减速器。在此演示中,我们展示筏从任务和节点故障中有效地恢复。此外,观众可以通过易于使用的Web界面将筏与Hadoop进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号