首页> 外文会议>IEEE/IFIP International Conference on Dependable Systems and Networks >FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection
【24h】

FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection

机译:Firestarter:实用软件崩溃恢复,具有目标图书馆级故障注射

获取原文

摘要

Despite advances in software testing, many bugs still plague deployed software, leading to crashes and thus service disruption in high-availability production applications. Existing crash recovery solutions are either limited to transient faults or require manual annotations to target predetermined persistent bugs. Moreover, existing solutions are generally inefficient, hindering practical deployment.In this paper, we present FIRestarter (Fault Injection-based Restarter), an efficient and automatic crash recovery solution for commodity user applications. To eliminate the need for manual annotations, FIRestarter injects targeted software faults at the library interface to automatically trigger error handling code for standard library calls already part of the application. In particular, when a crash occurs, we roll back the application state before the last recoverable library call, inject a fault, and restart execution forcing the call to immediately return a predetermined error code. This strategy allows the application to automatically bypass the crashing code upon such a restart and exploits existing error-handling code to recover from even persistent bugs. Moreover, since library calls lie pervasively throughout the code, our design provides a large recovery surface despite the automated approach. Finally, FIRestarter’s recovery windows are small and frequent compared to traditional checkpoint-restart, which enables new optimizations such as the ability to support rollback by means of hybrid hardware/software transactional memory instrumentation and improve performance. We apply FIRestarter to a number of event-driven server applications and show our solution achieves near-instantaneous, state-preserving crash recovery in the face of even persistent crashes. On popular web servers, our evaluation results show a recovery surface of at least 77%, with low performance overhead of at most 17%.
机译:尽管软件测试进展,但许多错误仍然瘟疫部署软件,导致高可用性生产应用中的崩溃,从而导致服务中断。现有的崩溃恢复解决方案仅限于瞬态故障,或者需要手动注释来定位预定的持久性错误。此外,现有解决方案通常是低效的,妨碍实际部署。在本文中,我们提出了Firestarter(基于故障注入的Restarter),是商品用户应用的高效和自动碰撞恢复解决方案。为了消除对手动注释的需求,Firestarter在库接口中注入了目标软件故障,以自动触发标准库的错误处理代码,该函数已呼叫已有应用程序的一部分。特别是,当发生崩溃时,我们在上次可恢复的库调用之前重新回滚应用程序状态,注入故障,并重新启动执行迫使呼叫立即返回预定的错误代码。此策略允许应用程序在此次重新启动时自动绕过崩溃代码,并利用现有的错误处理代码以从甚至持久的错误中恢复。此外,由于库呼叫在整个代码中普遍存在,因此我们的设计尽管是自动化的方法,但我们的设计提供了大的恢复表面。最后,与传统检查点重启相比,Firestarter的恢复窗口很小,频繁,这使得新的优化可以通过混合硬件/软件事务内存仪器支持回滚的能力,并提高性能。我们将Firestarter应用于许多事件驱动的服务器应用程序,并显示我们的解决方案在甚至持久碰撞时近乎瞬间达到瞬间的崩溃恢复。在流行的Web服务器上,我们的评估结果显示恢复表面至少为77%,性能低至多为17%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号