Supporting fault tolerance in a data-intensive computing middleware

机译：在数据密集型计算中间件中支持容错

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developing this class of applications. Besides programmability and ease of parallelization, fault tolerance is clearly important for data-intensive applications, because of their long running nature, and because of the potential for using a large number of nodes for processing massive amounts of data. Fault-tolerance has been an important attribute of map-reduce as well in its Hadoop implementation, where it is based on replication of data in the file system. Two important goals in supporting fault-tolerance are low overheads and efficient recovery. With these goals, this paper describes a different approach for enabling data-intensive computing with fault-tolerance. Our approach is based on an API for developing data-intensive computations that is a variation of map-reduce, and it involves an explicit programmer-declared reduction object. We show how more efficient fault-tolerance support can be developed using this API. Particularly, as the reduction object represents the state of the computation on a node, we can periodically cache the reduction object from every node at another location and use it to support failure-recovery. We have extensively evaluated our approach using two data-intensive applications. Our results show that the overheads of our scheme are extremely low, and our system outperforms Hadoop both in absence and presence of failures.

机译：在过去的2 - 3年中，数据密集型计算的重要性越来越多地被认可，与地图的出现和普及紧密相连，因为开发这类应用程序。除了可编程性和轻度化之外，由于其长期运行性质，容错对于数据密集型应用显然是重要的，并且由于使用大量节点来处理大量数据的可能性，因此可能对数据密集型应用进行了显然。容错于其Hadoop实现中的Map-Deford的重要属性，在其中基于文件系统中的数据的复制。支持容错的两个重要目标是低开销和高效恢复。通过这些目标，本文介绍了一种不同的方法，可以实现具有容错的数据密集型计算。我们的方法是基于用于开发数据密集型计算的API，这是映射减少的变化，并且它涉及显式编程器声明的减少对象。我们展示了可以使用此API开发更有效的容错支持。特别地，随着还原对象表示节点上的计算状态，我们可以从另一个位置处的每个节点周期性地缓存减少对象，并使用它来支持故障恢复。我们使用两个数据密集型应用程序广泛评估了我们的方法。我们的研究结果表明，我们的计划的开销极低，我们的系统在缺席和存在失败的情况下均优于Hadoop。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共12页
会议地点
作者
Bicer T.; Wei Jiang; Agrawal G.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词
Cloud computing; Data-intensive computing; Fault tolerance; Map-Reduce;

机译：云计算;数据密集型计算;容错;地图减少;

相似文献

外文文献
中文文献
专利

1. Schizophrenic Middleware Support for Fault Tolerance [J] . Khaled Barbaria, Laurent Pautet, Isabelle Perseil Ada Letters . 2006,第3期

机译：精神分裂症中间件对容错的支持
2. Middleware to Manage Fault Tolerance Using Semi-Coordinated Checkpoints [J] . Wong Alvaro, Heymann Elisa, Rexachs Dolores, IEEE Transactions on Parallel and Distributed Systems . 2021,第2期

机译：使用半协调检查点管理故障容错的中间件
3. Fault Tolerance Middleware for a Multi-Core System [J] . NASA Tech Briefs . 2012,第6期

机译：多核系统的容错中间件
4. Supporting fault tolerance in a data-intensive computing middleware [C] . Bicer T., Wei Jiang, Agrawal G. 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：在数据密集型计算中间件中支持容错
5. Robust integration of multi-level fault detection mechanisms and recovery mechanisms in a component-based support middleware model for fault-tolerant real-time distributed computing. [D] . Zhou, Qian. 2009

机译：多级故障检测机制和恢复机制在基于组件的支持中间件模型中的可靠集成，用于容错实时分布式计算。
6. An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems [O] . Hajara Idris, Absalom E. Ezugwu, Sahalu B. Junaidu, -1

机译：网格计算系统中一种具有容错能力的蚁群优化算法
7. Middleware Fault Tolerance Support for the BOSS Embedded Operating System [O] . Afonso Francisco, Silva Carlos A., Montenegro Sérgio, 2006

机译：对BOSS嵌入式操作系统的中间件容错支持

Supporting fault tolerance in a data-intensive computing middleware

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅