首页> 外文期刊>Computing and visualization in science >An error-resilient redundant subspace correction method
【24h】

An error-resilient redundant subspace correction method

机译:容错的冗余子空间校正方法

获取原文
获取原文并翻译 | 示例
       

摘要

Due to increasing complexity of supercomputers, hard and soft errors are causing more and more problems in high-performance scientific and engineering computation. In order to improve reliability (increase the mean time to failure) of computing systems, a lot of efforts have been devoted to developing techniques to forecast, prevent, and recover from errors at different levels, including architecture, application, and algorithm. In this paper, we focus on algorithmic error resilient iterative solvers and introduce a redundant subspace correction method. Using a general framework of redundant subspace corrections, we construct iterative methods, which have the following properties: (1) maintain convergence when error occurs assuming it is detectable; (2) introduce low computational overhead when no error occurs; (3) require only small amount of point-to-point communication compared to traditional methods and maintain good load balance; (4) improve the mean time to failure. Preliminary numerical experiments demonstrate the efficiency and effectiveness of the new subspace correction method. For simplicity, the main ideas of the proposed framework were demonstrated using the Schwarz methods without a coarse space, which do not scale well in practice.
机译:由于超级计算机的复杂性不断提高,硬错误和软错误在高性能科学和工程计算中引起越来越多的问题。为了提高计算系统的可靠性(增加平均故障时间),人们已经投入了很多精力来开发用于预测,预防和从不同级别的错误中恢复的技术,包括体系结构,应用程序和算法。在本文中,我们专注于算法误差弹性迭代求解器,并介绍了一种冗余子空间校正方法。使用冗余子空间校正的通用框架,我们构造了迭代方法,这些方法具有以下特性:(1)假设错误是可检测的,则在发生错误时保持收敛; (2)在没有错误发生时引入较低的计算开销; (3)与传统方法相比,只需要少量的点对点通信,并保持良好的负载平衡; (4)提高平均故障时间。初步的数值实验证明了这种新的子空间校正方法的效率和有效性。为简单起见,使用Schwarz方法演示了所建议框架的主要思想,而没有粗糙的空间,在实践中无法很好地扩展。

著录项

  • 来源
    《Computing and visualization in science》 |2017年第3期|65-77|共13页
  • 作者单位

    State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China;

    Department of Mathematics, Pennsylvania State University, University Park, PA, USA;

    State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    fault-tolerance; error resilience; subspace correction; schwarz methods;

    机译:容错容错能力;子空间校正;schwarz方法;
  • 入库时间 2022-08-18 00:51:25

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号