首页> 中文期刊>电子与信息学报 >大规模并行高阶矩量法的容错算法研究

大规模并行高阶矩量法的容错算法研究

     

摘要

基于超级计算机的大规模并行电磁计算对于解决实际工程中的复杂电磁难题具有重要意义,但超级计算机中由节点故障导致的进程崩溃事件的概率远远高于普通计算机.该文针对传统电磁计算难以有效应对进程崩溃的现状,提出一种高效的、适用于大规模并行高阶矩量法的容错算法.在现有并行高阶矩量法的基础上,基于"硬盘缓存"和"直接内存读取"设计高效率、高可靠性的现场保护算法,同时设计了高效的断点恢复算法.算法的有效性主要在于"固定的现场保护点"这一特点,它使得算法在有故障的情况下仍然可以正常有序地进行;而原算法每次碰到故障,则只能从头计算.数值仿真实验验证了容错算法在应对进程崩溃事件时的有效性,大幅提高了大规模并行高阶矩量法的可靠性.%The large scale parallel electromagnetic computation based on the supercomputer is of great significance for solving complicate electromagnetic problems in practical engineering. However, the probability of the process crash event caused by node failure in the supercomputer is much higher than that in the regular computer. Considering the incapable action for traditional electromagnetic computation to overcome the process crash event, an efficient fault-tolerance algorithm for large scale parallel high order Method of Moments (MoM) is proposed in this paper. According to the parallel higher order method of moments algorithm available, a scene protection algorithm and a scene recovery algorithm with high efficiency and reliability are designed, based on the "disk cache" and "direct memory access" technique. The efficiency of this algorithm lies on the feature of the "fixed site protection", which makes it possible for the algorithm to work normal and ordered even encountering crash failure, while the original algorithm can only restart from the beginning. The numerical simulations demonstrate the efficiency of the fault-tolerant algorithm in dealing with the process crash, which improves greatly the reliability of the large scale parallel high order MoM.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号