首页> 外文会议>International conference on management of data >RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures
【24h】

RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures

机译:工作中的RAFT:在任务和节点故障下加速MapReduce应用程序

获取原文

摘要

The MapReduce framework is typically deployed on very large computing clusters where task and node failures are no longer an exception but the rule. Thus, fault-tolerance is an important aspect for the efficient operation of MapReduce jobs. However, currently MapReduce implementations fully recompute failed tasks (sub-parts of a job) from the beginning. This can significantly decrease the runtime performance of MapReduce applications. We present an alternative system that implements RAFT ideas [10]. RAFT is a family of powerful and inexpensive Recovery Algorithms for Fast-Tracking MapReduce jobs under task and node failures. To recover from task failures, RAFT exploits the intermediate results persisted by MapReduce at several points in time. RAFT piggybacks checkpoints on the task progress computation. To recover from node failures, RAFT maintains a per-map task list of all input key-value pairs producing intermediate results and pushes intermediate results to reducers. In this demo, we demonstrate that RAFT recovers efficiently from both task and node failures. Further, the audience can compare RAFT with Hadoop via an easy-to-use web interface.
机译:MapReduce框架通常部署在非常大的计算集群上,在该集群中,任务和节点故障不再是例外,而是规则。因此,容错是MapReduce作业高效运行的重要方面。但是,当前MapReduce实施从一开始就完全重新计算了失败的任务(工作的子部分)。这会大大降低MapReduce应用程序的运行时性能。我们提出了一种实现RAFT思想的替代系统[10]。 RAFT是功能强大且价格便宜的恢复算法家族,可用于在任务和节点故障下快速跟踪MapReduce作业。为了从任务失败中恢复,RAFT利用了MapReduce在多个时间点保留的中间结果。 RAFT背负任务进度计算的检查点。为了从节点故障中恢复,RAFT维护了所有输入键-值对的按图任务列表,这些列表产生中间结果,并将中间结果推送到化简器。在此演示中,我们演示了RAFT可有效地从任务和节点故障中恢复。此外,观众可以通过易于使用的Web界面将RAFT与Hadoop进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号