...
【24h】

Globally Precise-restartable Execution of Parallel Programs

机译:在全球范围内可精确重启的并行程序执行

获取原文
获取原文并翻译 | 示例

摘要

Emerging trends in computer design and use are likely to make exceptions, once rare, the norm, especially as the system size grows. Due to exceptions, arising from hardware faults, approximate computing, dynamic resource management, etc., successful and errorfree execution of programs may no longer be assured. Yet, designers will want to tolerate the exceptions so that the programs execute completely, efficiently and without external intervention. Modern computers easily handle exceptions in sequential programs, using precise interrupts. But they are ill-equipped to handle exceptions in parallel programs, which are growing in prevalence. In this work we introduce the notion of globally preciserestartable execution of parallel programs, analogous to preciseinterruptible execution of sequential programs. We present a software runtime recovery system based on the approach to handle exceptions in suitably-written parallel programs. Qualitative and quantitative analyses show that the proposed system scales with the system size, especially when exceptions are frequent, unlike the conventional checkpoint-and-recovery method.
机译:计算机设计和使用的新兴趋势很可能会成为例外,一旦出现这种情况,就成为例外,特别是随着系统规模的扩大。由于硬件故障,近似计算,动态资源管理等引起的异常,可能无法再确保程序的成功执行和无错误执行。然而,设计人员将希望容忍异常,以便程序能够完全,有效地执行,而无需外部干预。现代计算机可以使用精确的中断轻松地处理顺序程序中的异常。但是他们没有足够的能力来处理并行程序中的异常,这种异常的发生率越来越高。在这项工作中,我们引入了并行程序的全局精确可重新启动执行的概念,类似于顺序程序的精确可中断执行。我们介绍一种基于在适当编写的并行程序中处理异常的方法的软件运行时恢复系统。定性和定量分析表明,与常规检查点和恢复方法不同,拟议的系统可以随着系统大小而扩展,特别是在异常频繁的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号