【24h】

How fail-stop are faulty programs?

机译:错误程序如何进行故障停止?

获取原文

摘要

Most fault-tolerant systems are designed to stop faulty programs before they write permanent data or communicate with other processes. This property (halt-on-failure) forms the core of the fail-stop model. Unfortunately, little experimental data exists on whether or not program failures follow the fail-stop model. This paper describes a tool, based on the SimOS complete-machine simulator that can trace how faults propagate through memory, disk, and functions. Using this tool on the Postgres database system, we conduct a controlled experiment to measure how often faulty programs violate the fail-stop model. We find that a significant number of faults (7%) violate the fail-stop model by writing incorrect data to stable storage before halting. We then apply Postgres' transaction mechanism to undo recent changes before a crash and find that transactions reduce fail-stop violations by a factor of 3.
机译:大多数容错系统旨在在编写永久数据或与其他进程通信之前停止故障程序。此属性(暂停故障)形成故障停止模型的核心。不幸的是,程序故障遵循故障停止模型时,存在很少的实验数据。本文介绍了一种工具,基于SIMOS完整的机器模拟器,可以追踪故障如何传播通过内存,磁盘和功能。在Postgres数据库系统上使用此工具,我们进行受控实验,以测量错误的程序违反故障停止模型的频率。我们发现,通过将错误的数据写入停止前,通过将错误的数据写入稳定存储来违反故障停止模型,违反了失败停止模型。然后,我们将Postgres的事务机制应用于崩溃前撤消最近的更改,并发现事务将故障停止违规减少3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号