首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >Supporting fault tolerance in a data-intensive computing middleware
【24h】

Supporting fault tolerance in a data-intensive computing middleware

机译:在数据密集型计算中间件中支持容错

获取原文

摘要

Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developing this class of applications. Besides programmability and ease of parallelization, fault tolerance is clearly important for data-intensive applications, because of their long running nature, and because of the potential for using a large number of nodes for processing massive amounts of data. Fault-tolerance has been an important attribute of map-reduce as well in its Hadoop implementation, where it is based on replication of data in the file system. Two important goals in supporting fault-tolerance are low overheads and efficient recovery. With these goals, this paper describes a different approach for enabling data-intensive computing with fault-tolerance. Our approach is based on an API for developing data-intensive computations that is a variation of map-reduce, and it involves an explicit programmer-declared reduction object. We show how more efficient fault-tolerance support can be developed using this API. Particularly, as the reduction object represents the state of the computation on a node, we can periodically cache the reduction object from every node at another location and use it to support failure-recovery. We have extensively evaluated our approach using two data-intensive applications. Our results show that the overheads of our scheme are extremely low, and our system outperforms Hadoop both in absence and presence of failures.
机译:在过去的2 - 3年中,数据密集型计算的重要性越来越多地被认可,与地图的出现和普及紧密相连,因为开发这类应用程序。除了可编程性和轻度化之外,由于其长期运行性质,容错对于数据密集型应用显然是重要的,并且由于使用大量节点来处理大量数据的可能性,因此可能对数据密集型应用进行了显然。容错于其Hadoop实现中的Map-Deford的重要属性,在其中基于文件系统中的数据的复制。支持容错的两个重要目标是低开销和高效恢复。通过这些目标,本文介绍了一种不同的方法,可以实现具有容错的数据密集型计算。我们的方法是基于用于开发数据密集型计算的API,这是映射减少的变化,并且它涉及显式编程器声明的减少对象。我们展示了可以使用此API开发更有效的容错支持。特别地,随着还原对象表示节点上的计算状态,我们可以从另一个位置处的每个节点周期性地缓存减少对象,并使用它来支持故障恢复。我们使用两个数据密集型应用程序广泛评估了我们的方法。我们的研究结果表明,我们的计划的开销极低,我们的系统在缺席和存在失败的情况下均优于Hadoop。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号