RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures

机译：RAFT在工作中：在任务和节点故障下加速MapReduce应用程序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The MapReduce framework is typically deployed on very large computing clusters where task and node failures are no longer an exception but the rule. Thus, fault-tolerance is an important aspect for the efficient operation of MapReduce jobs. However, currently MapReduce implementations fully recompute failed tasks (sub-parts of a job) from the beginning. This can significantly decrease the runtime performance of MapReduce applications. We present an alternative system that implements RAFT ideas [10]. RAFT is a family of powerful and inexpensive Recovery Algorithms for Fast-Tracking MapReduce jobs under task and node failures. To recover from task failures, RAFT exploits the intermediate results persisted by MapReduce at several points in time. RAFT piggybacks checkpoints on the task progress computation. To recover from node failures, RAFT maintains a per-map task list of all input key-value pairs producing intermediate results and pushes intermediate results to reducers. In this demo, we demonstrate that RAFT recovers efficiently from both task and node failures. Further, the audience can compare RAFT with Hadoop via an easy-to-use web interface.

机译：MapReduce框架通常在非常大的计算群集上部署，其中任务和节点故障不再是例外但规则。因此，容错是MapReduce作业的有效操作的一个重要方面。但是，目前MapReduce实现从头开始完全重新编译失败的任务（作业的子部分）。这可以显着降低MapReduce应用程序的运行时性能。我们提出了一种实现筏子想法的替代系统[10]。 RAFT是一个强大而廉价的恢复算法，用于在任务和节点故障下快速跟踪MapReduce作业。要从任务失败中恢复，RAFT利用MapReduce在几个时间点持久的中间结果。 RAFT LIFENACKS关于任务进度计算的检查站。要从节点故障中恢复，RAFT将维护所有输入键值对的每个地图任务列表，从而产生中间结果，然后将中间结果推向减速器。在此演示中，我们展示筏从任务和节点故障中有效地恢复。此外，观众可以通过易于使用的Web界面将筏与Hadoop进行比较。

著录项

来源
《International conference on management of data》|2011年||共3页
会议地点
作者
Jorge-Amulfo Quiane-Ruiz; Christoph Pinkel; Joerg Schad; Jens Dittrich;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
MapReduce; Hadoop; Node Failures; Fault-Tolerance; Recovery;

机译：mapreduce;hadoop;节点故障;容错;恢复;

相似文献

外文文献
中文文献
专利

1. A task-level adaptive MapReduce framework for real-time streaming data in healthcare applications [J] . Fan Zhang, Junwei Cao, Samee U. Khan, Future generation computer systems . 2015,第feba期

机译：一种任务级自适应MapReduce框架，用于医疗保健应用程序中的实时流数据
2. Speeding-up codon analysis on the cloud with local MapReduce aggregation [J] . Atanas Radenski, Louis Ehwerhemuepha Information Sciences: An International Journal . 2014,第Null期

机译：使用本地MapReduce聚合在云上加速密码子分析
3. Task failure resilience technique for improving the performance of MapReduce in Hadoop [J] . Kavitha C, Anita X ETRI journal . 2020,第5期

机译：提高Hadoop中MapReduce性能的任务故障恢复技术
4. RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures [C] . Jorge-Amulfo Quiane-Ruiz, Christoph Pinkel, Joerg Schad, International conference on management of data . 2011

机译：工作中的RAFT：在任务和节点故障下加速MapReduce应用程序
5. A software system for user application tolerance of network and computing node failures. [D] . Myers, Byron James. 2002

机译：一种用于用户应用程序容忍网络和计算节点故障的软件系统。
6. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends [O] . Emad A Mohammed, Behrouz H Far, Christopher Naugler 2014

机译：MapReduce编程框架在临床大数据分析中的应用：当前形势和未来趋势
7. Approximate zero-variance importance sampling for static network reliability estimation with node failures and application to rail systems [O] . Rai, Ajit, Valenzuela, Rene,, Tuffin, Bruno, 2016

机译：具有节点故障的静态网络可靠性估计的近似零方差重要性抽样，并将其应用于铁路系统
8. Small computational node embedded within a high-speed network fabric for spacecraft flight applications [R] . Watson, R. K., Petras, R. D., Bolotin, G. S. 2003

机译：嵌入在用于航天器飞行应用的高速网络结构中的小计算节点

RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures

摘要

著录项

相似文献

相关主题

期刊订阅