RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures

机译：工作中的RAFT：在任务和节点故障下加速MapReduce应用程序

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The MapReduce framework is typically deployed on very large computing clusters where task and node failures are no longer an exception but the rule. Thus, fault-tolerance is an important aspect for the efficient operation of MapReduce jobs. However, currently MapReduce implementations fully recompute failed tasks (sub-parts of a job) from the beginning. This can significantly decrease the runtime performance of MapReduce applications. We present an alternative system that implements RAFT ideas [10]. RAFT is a family of powerful and inexpensive Recovery Algorithms for Fast-Tracking MapReduce jobs under task and node failures. To recover from task failures, RAFT exploits the intermediate results persisted by MapReduce at several points in time. RAFT piggybacks checkpoints on the task progress computation. To recover from node failures, RAFT maintains a per-map task list of all input key-value pairs producing intermediate results and pushes intermediate results to reducers. In this demo, we demonstrate that RAFT recovers efficiently from both task and node failures. Further, the audience can compare RAFT with Hadoop via an easy-to-use web interface.

机译：MapReduce框架通常部署在非常大的计算集群上，在该集群中，任务和节点故障不再是例外，而是规则。因此，容错是MapReduce作业高效运行的重要方面。但是，当前MapReduce实施从一开始就完全重新计算了失败的任务（工作的子部分）。这会大大降低MapReduce应用程序的运行时性能。我们提出了一种实现RAFT思想的替代系统[10]。 RAFT是功能强大且价格便宜的恢复算法家族，可用于在任务和节点故障下快速跟踪MapReduce作业。为了从任务失败中恢复，RAFT利用了MapReduce在多个时间点保留的中间结果。 RAFT背负任务进度计算的检查点。为了从节点故障中恢复，RAFT维护了所有输入键-值对的按图任务列表，这些列表产生中间结果，并将中间结果推送到化简器。在此演示中，我们演示了RAFT可有效地从任务和节点故障中恢复。此外，观众可以通过易于使用的Web界面将RAFT与Hadoop进行比较。

著录项

来源
《International conference on management of data》|2011年|1225-1227|共3页
会议地点
作者
Jorge-Amulfo Quiane-Ruiz; Christoph Pinkel; Joerg Schad; Jens Dittrich;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
MapReduce; Hadoop; Node Failures; Fault-Tolerance; Recovery;

机译：MapReduce; Hadoop;节点故障;容错;复苏;

相似文献

外文文献
中文文献
专利

1. A task-level adaptive MapReduce framework for real-time streaming data in healthcare applications [J] . Fan Zhang, Junwei Cao, Samee U. Khan, Future generation computer systems . 2015,第feba期

机译：一种任务级自适应MapReduce框架，用于医疗保健应用程序中的实时流数据
2. Speeding-up codon analysis on the cloud with local MapReduce aggregation [J] . Atanas Radenski, Louis Ehwerhemuepha Information Sciences: An International Journal . 2014,第Null期

机译：使用本地MapReduce聚合在云上加速密码子分析
3. Task failure resilience technique for improving the performance of MapReduce in Hadoop [J] . Kavitha C, Anita X ETRI journal . 2020,第5期

机译：提高Hadoop中MapReduce性能的任务故障恢复技术
4. RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures [C] . Jorge-Amulfo Quiane-Ruiz, Christoph Pinkel, Joerg Schad, International conference on management of data . 2011

机译：RAFT在工作中：在任务和节点故障下加速MapReduce应用程序
5. A software system for user application tolerance of network and computing node failures. [D] . Myers, Byron James. 2002

机译：一种用于用户应用程序容忍网络和计算节点故障的软件系统。
6. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends [O] . Emad A Mohammed, Behrouz H Far, Christopher Naugler 2014

机译：MapReduce编程框架在临床大数据分析中的应用：当前形势和未来趋势
7. Approximate zero-variance importance sampling for static network reliability estimation with node failures and application to rail systems [O] . Rai, Ajit, Valenzuela, Rene,, Tuffin, Bruno, 2016

机译：具有节点故障的静态网络可靠性估计的近似零方差重要性抽样，并将其应用于铁路系统
8. Small computational node embedded within a high-speed network fabric for spacecraft flight applications [R] . Watson, R. K., Petras, R. D., Bolotin, G. S. 2003

机译：嵌入在用于航天器飞行应用的高速网络结构中的小计算节点

RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅