首页> 外文期刊>IEEE transactions on dependable and secure computing >Generic Soft-Error Detection and Correction for Concurrent Data Structures
【24h】

Generic Soft-Error Detection and Correction for Concurrent Data Structures

机译:并发数据结构的通用软错误检测和纠正

获取原文
获取原文并翻译 | 示例

摘要

Recent studies indicate that transient memory errors (soft errors) have become a relevant source of system failures. This paper presents a generic software-based fault-tolerance mechanism that transparently recovers from memory errors in object-oriented program data structures. The main benefits are the flexibility to choose from an extensible toolbox of easily pluggable error detection and correction schemes, such as Hamming and CRC codes. This is achieved by a combination of aspect-oriented and generative programming techniques. Furthermore, we present a wait-free synchronization algorithm for error detection in data structures that are used concurrently by multiple threads of control. We give a formal correctness proof and show the excellent scalability of our approach in a multiprocessor environment. In a case study, we present our experiences with selectively hardening the eCos operating system and its benchmark suite. We explore the trade-off between resiliency and performance by choosing only the most vulnerable data structures for error recovery. Thereby, the total number of system failures, manifesting as silent data corruptions and crashes, is reduced by 69.14 percent at a negligible runtime overhead of 0.36 percent.
机译:最近的研究表明,瞬时内存错误(软错误)已成为系统故障的相关原因。本文提出了一种基于软件的通用容错机制,该机制可从面向对象的程序数据结构中的内存错误中透明地恢复。主要优点是可以从可轻松插入的错误检测和纠正方案(例如汉明和CRC码)的可扩展工具箱中灵活选择。这是通过结合面向方面和生成编程技术来实现的。此外,我们提出了一种免等待的同步算法,用于在多个控制线程同时使用的数据结构中进行错误检测。我们提供了正式的正确性证明,并展示了我们的方法在多处理器环境中的出色可伸缩性。在案例研究中,我们介绍了选择性强化eCos操作系统及其基准套件的经验。通过仅选择最易受攻击的数据结构进行错误恢复,我们探索了弹性和性能之间的权衡。因此,表现为静默数据损坏和崩溃的系统故障总数减少了69.14%,而运行时开销可以忽略不计0.36%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号