...
首页> 外文期刊>Procedia Computer Science >Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce
【24h】

Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce

机译:基于容错网格的求解器:结合稀疏网格和MapReduce的概念

获取原文
   

获取外文期刊封面封底 >>

       

摘要

A key issue confronting petascale and exascale computing is the growth in probability of soft and hard faults with increasing system size. A promising approach to this problem is the use of algorithms that are inherently fault tolerant. We introduce such an algorithm for the solution of partial differential equations, based on the sparse grid approach. Here, the solution of multiple component grids are efficiently combined to achieve a solution on a full grid. The technique also lends itself to a (modified) MapReduce framework on a cluster of processors, with the map stage corresponding to allocating each component grid for solution over a subset of the processors, and the reduce stage corresponding to their combination. We describe how the sparse grid combination method can be modified to robustly solve partial differential equations in the presence of faults. This is based on a modified combination formula that can accommodate the loss of one or two component grids. We also discuss accuracy issues associated with this formula. We give details of a prototype implementation within a MapReduce framework using the dynamic process features and asynchronous message passing facilities of MPI. Results on a two-dimensional advection problem show that the errors after the loss of one or two sub-grids are within a factor of 3 of the sparse grid solution in the presence of no faults. They also indicate that the sparse grid technique with four times the resolution has approximately the same error as a full grid, while requiring (for a sufficiently high resolution) much lower computation and memory requirements. We finally outline a MapReduce variant capable of responding to faults in ways other than re-scheduling of failed tasks. We discuss the likely software requirements for such a flexible MapReduce framework, the requirements it will impose on users’ legacy codes, and the system's runtime behavior.
机译:千万亿亿级计算面临的一个关键问题是,随着系统规模的增大,软故障和硬故障的概率也将随之增加。解决该问题的一种有前途的方法是使用固有具有容错能力的算法。我们引入了一种基于稀疏网格方法的偏微分方程求解算法。在这里,有效地组合了多个组件网格的解决方案,以在完整网格上实现解决方案。该技术还使其自身适用于处理器集群上的(经过修改的)MapReduce框架,其中map阶段对应于在处理器子集上分配每个组件网格以进行求解,而reduce阶段则对应于它们的组合。我们描述了如何修改稀疏网格组合方法以在存在故障的情况下稳健地求解偏微分方程。这是基于修改后的组合公式得出的,该公式可以适应一个或两个组件网格的损失。我们还将讨论与此公式相关的准确性问题。我们使用MPI的动态过程功能和异步消息传递功能在MapReduce框架中提供了原型实现的详细信息。二维对流问题的结果表明,在没有故障的情况下,丢失一个或两个子网格后的误差是稀疏网格解的三分之一。他们还指出,具有四倍分辨率的稀疏网格技术具有与整个网格大致相同的误差,同时(对于足够高的分辨率)需要更低的计算和内存要求。最后,我们概述了一个MapReduce变体,该变体能够以不同于重新安排失败任务的方式来响应故障。我们讨论了这种灵活的MapReduce框架可能的软件要求,将其施加到用户的旧代码上的要求以及系统的运行时行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号