Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce

J.W. Larson; M. Hegland; B. Harding; S. Roberts; L. Stals; A.P. Rendell; P. Strazdins; M.M. Ali; C. Kowitz; R. Nobes; J. Southern; N. Wilson; M. Li; Y. Oishi

首页> 外文期刊>Procedia Computer Science >Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce

【24h】

Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce

机译：基于容错网格的求解器：结合稀疏网格和MapReduce的概念

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A key issue confronting petascale and exascale computing is the growth in probability of soft and hard faults with increasing system size. A promising approach to this problem is the use of algorithms that are inherently fault tolerant. We introduce such an algorithm for the solution of partial differential equations, based on the sparse grid approach. Here, the solution of multiple component grids are efficiently combined to achieve a solution on a full grid. The technique also lends itself to a (modified) MapReduce framework on a cluster of processors, with the map stage corresponding to allocating each component grid for solution over a subset of the processors, and the reduce stage corresponding to their combination. We describe how the sparse grid combination method can be modified to robustly solve partial differential equations in the presence of faults. This is based on a modified combination formula that can accommodate the loss of one or two component grids. We also discuss accuracy issues associated with this formula. We give details of a prototype implementation within a MapReduce framework using the dynamic process features and asynchronous message passing facilities of MPI. Results on a two-dimensional advection problem show that the errors after the loss of one or two sub-grids are within a factor of 3 of the sparse grid solution in the presence of no faults. They also indicate that the sparse grid technique with four times the resolution has approximately the same error as a full grid, while requiring (for a sufficiently high resolution) much lower computation and memory requirements. We finally outline a MapReduce variant capable of responding to faults in ways other than re-scheduling of failed tasks. We discuss the likely software requirements for such a flexible MapReduce framework, the requirements it will impose on users’ legacy codes, and the system's runtime behavior.

机译：千万亿亿级计算面临的一个关键问题是，随着系统规模的增大，软故障和硬故障的概率也将随之增加。解决该问题的一种有前途的方法是使用固有具有容错能力的算法。我们引入了一种基于稀疏网格方法的偏微分方程求解算法。在这里，有效地组合了多个组件网格的解决方案，以在完整网格上实现解决方案。该技术还使其自身适用于处理器集群上的（经过修改的）MapReduce框架，其中map阶段对应于在处理器子集上分配每个组件网格以进行求解，而reduce阶段则对应于它们的组合。我们描述了如何修改稀疏网格组合方法以在存在故障的情况下稳健地求解偏微分方程。这是基于修改后的组合公式得出的，该公式可以适应一个或两个组件网格的损失。我们还将讨论与此公式相关的准确性问题。我们使用MPI的动态过程功能和异步消息传递功能在MapReduce框架中提供了原型实现的详细信息。二维对流问题的结果表明，在没有故障的情况下，丢失一个或两个子网格后的误差是稀疏网格解的三分之一。他们还指出，具有四倍分辨率的稀疏网格技术具有与整个网格大致相同的误差，同时（对于足够高的分辨率）需要更低的计算和内存要求。最后，我们概述了一个MapReduce变体，该变体能够以不同于重新安排失败任务的方式来响应故障。我们讨论了这种灵活的MapReduce框架可能的软件要求，将其施加到用户的旧代码上的要求以及系统的运行时行为。

著录项

来源
《Procedia Computer Science》 |2013年第1期|共10页
作者
J.W. Larson; M. Hegland; B. Harding; S. Roberts; L. Stals; A.P. Rendell; P. Strazdins; M.M. Ali; C. Kowitz; R. Nobes; J. Southern; N. Wilson; M. Li; Y. Oishi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Parallel computingpartial differential equationsfault-tolerancesparse gridsMapReduce;

机译：并行计算偏微分方程容错稀疏网格MapReduce;

相似文献

外文文献
中文文献
专利

1. Parallel implementation of an efficient preconditioned linear solver for grid-based applications in chemical physics. III: Improved parallel scalability for sparse matrix-vector products [J] . Wenwu Chen, Bill Poirier Journal of Parallel and Distributed Computing . 2010,第7期

机译：用于化学物理学中基于网格的应用程序的高效预处理线性求解器的并行实现。 III：稀疏矩阵矢量乘积的并行可扩展性得到改善
2. Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce [J] . Miyoung JANG, Jae-Woo CHANG IEICE transactions on information and systems . 2018,第4期

机译：MapReduce上多维数据分析的基于网格的联合查询并行算法
3. Grid-based swithch fabrics: a new approach in designing fault-tolerant ATM switches [J] . H.S. Laskaridis, A.A. Veglis, G.I. Papadimitriou Computer Communications . 2001,第15a16期

机译：基于网格的交换结构：设计容错ATM交换机的新方法
4. Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce [C] . J. W. Larson, M. Hegland, B. Harding, International Conference on Computational Science . 2013

机译：基于容错的网格的求解器：将概念与稀疏网格和MapReduce相结合
5. Sparse grid-based modeling and control of biological systems. [D] . Donahue, Maia Mahoney. 2009

机译：基于稀疏网格的生物系统建模和控制。
6. Impact of heterogeneity-corrected dose calculation using a grid-based Boltzmann solver on breast and cervix cancer brachytherapy [O] . Julia Hofbauer, Prof. Christian Kirisits, Alexandra Resch, 2016

机译：使用基于网格的玻尔兹曼求解器进行异质校正的剂量计算对乳腺癌和宫颈癌近距离治疗的影响
7. Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce [O] . Larson J.W., Hegland M., Harding B., 2013

机译：基于容错网格的求解器：结合稀疏网格和MapReduce的概念
8. Fault-tolerant flight control system combining expert system and analytical redundancy concepts [R] . Handelman, Dave 1987

机译：容错飞行控制系统结合专家系统和分析冗余概念

Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅