...
首页> 外文期刊>International journal of parallel programming >RedThreads: An Interface for Application-Level Fault Detection/Correction Through Adaptive Redundant Multithreading
【24h】

RedThreads: An Interface for Application-Level Fault Detection/Correction Through Adaptive Redundant Multithreading

机译:RedThreads:通过自适应冗余多线程进行应用程序级故障检测/纠正的接口

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Abstract In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. However, the use of complete redundancy incurs significant overhead to the application performance. In this paper we present RedThreads, an interface that provides application-level fault detection and correction based on RMT, but applies the thread-level redundancy adaptively. We describe the RedThreads syntax and semantics, and the supporting compiler infrastructure and runtime system. Our approach enables application programmers to scope the extent of redundant computation. Additionally, the runtime system permits the use of RMT to be dynamically enabled, or disabled, based on the resiliency needs of the application and the state of the system. Our experimental results demonstrate how adaptive RMT exploits programmer insight and runtime inference to dynamically navigate the trade-off space between an application’s resilience coverage and the associated performance overhead of redundant computation.
机译:摘要在存在加速故障率的情况下,预计该故障率将成为将来的亿万亿次系统的标准,对于高性能计算(HPC)应用而言,完成有用的计算将变得越来越困难。由于当前HPC编程范例和执行环境的故障遗忘特性,HPC应用程序不足以处理错误。我们认为,应该启用HPC应用程序,使其具有主动搜索并纠正其计算错误的功能。冗余多线程(RMT)方法在单个应用程序进程的上下文中提供了程序指令的轻量级复制执行流。但是,使用完全冗余会给应用程序性能带来可观的开销。在本文中,我们介绍了RedThreads,该接口提供基于RMT的应用程序级故障检测和纠正,但自适应地应用线程级冗余。我们描述了RedThreads的语法和语义,以及支持的编译器基础结构和运行时系统。我们的方法使应用程序程序员能够确定冗余计算的范围。此外,运行时系统允许根据应用程序的弹性需求和系统状态动态地启用或禁用RMT。我们的实验结果表明,自适应RMT如何利用程序员的见识和运行时推断来动态导航应用程序的弹性覆盖范围和冗余计算的相关性能开销之间的折衷空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号