首页> 外文会议>European Dependable Computing Conference >An Architectural Framework for Detecting Process Hangs/Crashes
【24h】

An Architectural Framework for Detecting Process Hangs/Crashes

机译:用于检测过程的建筑框架挂起/崩溃

获取原文

摘要

This paper addresses the challenges faced in practical implementation of heartbeat-based process/crash and hang detection. We propose an in-processor hardware module to reduce error detection latency and instrumentation overhead. Three hardware techniques integrated into the main pipeline of a superscalar processor are presented. The techniques discussed in this work are: (ⅰ) Instruction Count Heartbeat (ICH), which detects process crashes and a class of hangs where the process exists but is not executing any instructions, (ⅱ) Infinite Loop Hang Detector (ILHD), which captures process hangs in infinite execution of legitimate loops, and (ⅲ) Sequential Code Hang Detector (SCHD), which detects process hangs in illegal loops. The proposed design has the following unique features: 1) operating system neutral detection techniques, 2) elimination of any instrumentation for detection of all application crashes and OS hangs, and 3) an automated and light-weight compile-time instrumentation methodology to detect all process hangs (including infinite loops), the detection being performed in the hardware module at runtime. The proposed techniques can support heartbeat protocols to detect operating system/process crashes and hangs in distributed systems. Evaluation of the techniques for hang detection show a low 1.6% performance overhead and 6% memory overhead for the instrumentation. The crash detection technique does not incur any performance overhead and has a latency of a few instructions.
机译:本文将解决面临的实际执行基于心跳过程/崩溃和挂起检测的挑战。我们提出了一个在处理器的硬件模块,以减少错误检测时延和检测开销。集成到超标量处理器的主要管道三种硬件技术呈现。在这项工作中所讨论的技术是:(ⅰ)指令计数心跳(ICH),其检测处理崩溃和一类挂起的其中过程存在,但不执行任何指令,(ⅱ)无限循环航检测器(ILHD),其捕获处理合法循环无限的执行,以及(ⅲ)顺序代码航探测器(SCHD),检测非法的循环进程挂起挂起。所提出的设计具有以下独特的特点:1)操作系统中性检测技术,2)消除任何仪器检测所有应用程序崩溃和OS挂起的,以及3)一个自动且重量轻的编译时间检测方法,以检测所有过程挂起(包括无限循环),检测存在在运行时硬件模块中进行的。所提出的技术能够支持心跳协议来检测操作系统/过程崩溃和挂起在分布式系统。的技术挂起检测评价显示出低的1.6%的性能开销和6%内存开销为仪器仪表。该碰撞检测技术不会产生任何性能开销,有几个指令的等待时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号