Supporting fault tolerance in a data-intensive computing middleware

机译：在数据密集型计算中间件中支持容错

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developing this class of applications. Besides programmability and ease of parallelization, fault tolerance is clearly important for data-intensive applications, because of their long running nature, and because of the potential for using a large number of nodes for processing massive amounts of data. Fault-tolerance has been an important attribute of map-reduce as well in its Hadoop implementation, where it is based on replication of data in the file system. Two important goals in supporting fault-tolerance are low overheads and efficient recovery. With these goals, this paper describes a different approach for enabling data-intensive computing with fault-tolerance. Our approach is based on an API for developing data-intensive computations that is a variation of map-reduce, and it involves an explicit programmer-declared reduction object. We show how more efficient fault-tolerance support can be developed using this API. Particularly, as the reduction object represents the state of the computation on a node, we can periodically cache the reduction object from every node at another location and use it to support failure-recovery. We have extensively evaluated our approach using two data-intensive applications. Our results show that the overheads of our scheme are extremely low, and our system outperforms Hadoop both in absence and presence of failures.

机译：在过去的2-3年中，人们越来越认识到数据密集型计算的重要性，并且与开发此类应用程序的map-reduce的出现和普及紧密相关。除了可编程性和并行化的简便性之外，容错对于数据密集型应用程序也很重要，这是因为它们具有长期运行的特性，并且可能会使用大量节点来处理大量数据。容错在基于文件系统中数据复制的Hadoop实现中，也是map-reduce的重要属性。支持容错的两个重要目标是较低的开销和有效的恢复。为了实现这些目标，本文介绍了一种不同的方法来实现具有容错能力的数据密集型计算。我们的方法基于一种用于开发数据密集型计算的API，该API是map-reduce的一种变体，它涉及程序员明确声明的约简对象。我们展示了如何使用此API开发更有效的容错支持。特别是，由于约简对象代表节点上计算的状态，因此我们可以定期从另一个位置的每个节点缓存约简对象，并使用它来支持故障恢复。我们已经使用两个数据密集型应用程序广泛评估了我们的方法。我们的结果表明，我们的方案的开销非常低，并且无论是否存在故障，我们的系统都优于Hadoop。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|p.1-12|共12页
会议地点 Atlanta GA(US)
作者
Bicer T.; Wei Jiang; Agrawal G.;
展开▼
作者单位

Dept. of Comput. Sci. Eng., Ohio State Univ., Columbus, OH, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词
Cloud computing; Data-intensive computing; Fault tolerance; Map-Reduce;

机译：云计算;数据密集型计算;容错; Map-Reduce;

相似文献

外文文献
中文文献
专利

1. Schizophrenic Middleware Support for Fault Tolerance [J] . Khaled Barbaria, Laurent Pautet, Isabelle Perseil Ada Letters . 2006,第3期

机译：精神分裂症中间件对容错的支持
2. Middleware to Manage Fault Tolerance Using Semi-Coordinated Checkpoints [J] . Wong Alvaro, Heymann Elisa, Rexachs Dolores, IEEE Transactions on Parallel and Distributed Systems . 2021,第2期

机译：使用半协调检查点管理故障容错的中间件
3. Fault Tolerance Middleware for a Multi-Core System [J] . NASA Tech Briefs . 2012,第6期

机译：多核系统的容错中间件
4. Supporting fault tolerance in a data-intensive computing middleware [C] . Bicer Tekin, Jiang Wei, Agrawal Gagan 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：在数据密集型计算中间件中支持容错
5. Robust integration of multi-level fault detection mechanisms and recovery mechanisms in a component-based support middleware model for fault-tolerant real-time distributed computing. [D] . Zhou, Qian. 2009

机译：多级故障检测机制和恢复机制在基于组件的支持中间件模型中的可靠集成，用于容错实时分布式计算。
6. An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems [O] . Hajara Idris, Absalom E. Ezugwu, Sahalu B. Junaidu, -1

机译：网格计算系统中一种具有容错能力的蚁群优化算法
7. Middleware Fault Tolerance Support for the BOSS Embedded Operating System [O] . Afonso Francisco, Silva Carlos A., Montenegro Sérgio, 2006

机译：对BOSS嵌入式操作系统的中间件容错支持

Supporting fault tolerance in a data-intensive computing middleware

摘要

著录项

相似文献

相关主题

期刊订阅