...
首页> 外文期刊>IEEE Transactions on Computers >Fault-containment in cache memories for TMR redundant processor systems
【24h】

Fault-containment in cache memories for TMR redundant processor systems

机译:用于TMR冗余处理器系统的高速缓存存储器中的故障包含

获取原文
获取原文并翻译 | 示例
           

摘要

Cache data errors read by a processor may cause CPU control flow error and force the system to enter a CPU-cache reintegration process in redundant processor systems. The reintegration process degrades the system performance and reliability. To reduce the occurrences of such an event, we propose a real-time error recovery scheme that provides effective fault-containment for data errors in cache memories. The scheme is based on cache data broadcasting of a dirty line after modification. It effectively exploits the redundancy of a fault-tolerant system using hardware voting. The scheme recovers from erroneous cache data written by a processor with full coverage. This error recovery feature remedies the insufficiency of error-correcting codes that are unable to prevent such an error. In addition, more than 60 percent of cache lines are fully covered for recovery due to errors originated from the cache itself, including unrecoverable ECC errors. The protocol can also be used to speedup the CPU-cache reintegration process for a temporarily failed processor. The performance overhead of the protocol is to broadcast only 2-3 percent of the total memory references.
机译:处理器读取的缓存数据错误可能会导致CPU控制流错误,并迫使系统进入冗余处理器系统中的CPU缓存重新集成过程。重新集成过程会降低系统性能和可靠性。为了减少此类事件的发生,我们提出了一种实时错误恢复方案,该方案可为高速缓存中的数据错误提供有效的故障遏制。该方案基于修改后脏线的高速缓存数据广播。它使用硬件投票有效地利用了容错系统的冗余。该方案从具有完整覆盖范围的处理器写入的错误缓存数据中恢复。此错误恢复功能弥补了无法防止此类错误的错误纠正代码的不足。此外,由于来自缓存本身的错误(包括不可恢复的ECC错误),完全覆盖了60%以上的缓存行以进行恢复。该协议还可用于加速暂时故障的处理器的CPU缓存重新集成过程。该协议的性能开销仅广播总内存引用的2-3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号