首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >Supporting fault tolerance in a data-intensive computing middleware
【24h】

Supporting fault tolerance in a data-intensive computing middleware

机译:在数据密集型计算中间件中支持容错

获取原文
获取原文并翻译 | 示例

摘要

Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developing this class of applications. Besides programmability and ease of parallelization, fault tolerance is clearly important for data-intensive applications, because of their long running nature, and because of the potential for using a large number of nodes for processing massive amounts of data. Fault-tolerance has been an important attribute of map-reduce as well in its Hadoop implementation, where it is based on replication of data in the file system. Two important goals in supporting fault-tolerance are low overheads and efficient recovery. With these goals, this paper describes a different approach for enabling data-intensive computing with fault-tolerance. Our approach is based on an API for developing data-intensive computations that is a variation of map-reduce, and it involves an explicit programmer-declared reduction object. We show how more efficient fault-tolerance support can be developed using this API. Particularly, as the reduction object represents the state of the computation on a node, we can periodically cache the reduction object from every node at another location and use it to support failure-recovery. We have extensively evaluated our approach using two data-intensive applications. Our results show that the overheads of our scheme are extremely low, and our system outperforms Hadoop both in absence and presence of failures.
机译:在过去的2-3年中,人们越来越认识到数据密集型计算的重要性,并且与开发此类应用程序的map-reduce的出现和普及紧密相关。除了可编程性和并行化的简便性之外,容错对于数据密集型应用程序也很重要,这是因为它们具有长期运行的特性,并且可能会使用大量节点来处理大量数据。容错在基于文件系统中数据复制的Hadoop实现中,也是map-reduce的重要属性。支持容错的两个重要目标是较低的开销和有效的恢复。为了实现这些目标,本文介绍了一种不同的方法来实现具有容错能力的数据密集型计算。我们的方法基于一种用于开发数据密集型计算的API,该API是map-reduce的一种变体,它涉及程序员明确声明的约简对象。我们展示了如何使用此API开发更有效的容错支持。特别是,由于约简对象代表节点上计算的状态,因此我们可以定期从另一个位置的每个节点缓存约简对象,并使用它来支持故障恢复。我们已经使用两个数据密集型应用程序广泛评估了我们的方法。我们的结果表明,我们的方案的开销非常低,并且无论是否存在故障,我们的系统都优于Hadoop。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号