首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >CARE: Coordinated Augmentation for Elastic Resilience on DRAM Errors in Data Centers
【24h】

CARE: Coordinated Augmentation for Elastic Resilience on DRAM Errors in Data Centers

机译:关心:在数据中心中的DRAM错误中的弹性弹性协调增强

获取原文

摘要

As the computation density and memory capacity continues to grow, DRAM errors have become the leading cause of server crashes and/or system failures in modern data centers. While myriads of techniques have been proposed to mitigate their impact on system reliability, these solutions either incur significant overhead on performance, power and memory capacity or require modifying multiple system components; hence, they are impractical to implement or deploy. This paper proposes CARE, a novel error tolerance framework for efficient and elastic resilience on DRAM errors. It introduces a cache-like structure in the memory controller for dynamic error tracking and proactive resilience enhancement to achieve high error tolerance economically and practically. Experiment results show that with around 58KB area overhead in the memory controller, CARE achieves near Chipkill reliability without any memory capacity penalty and incurs negligible performance overhead compared with the baseline SEC-DED systems. CARE provides an attractive alternative to enhance the reliability in data centers.
机译:随着计算密度和内存容量继续增长,DRAM错误已成为现代数据中心中服务器崩溃和/或系统故障的主要原因。虽然已经提出了无数的技术,以减轻它们对系统可靠性的影响,但这些解决方案在性能,电源和内存容量上产生了显着的开销,或者需要修改多个系统组件;因此,它们是实施或部署的不切实际。本文提出了一种小说,一种用于DRAM错误的高效和弹性弹性的误差容差框架。它在存储器控制器中引入了类似的缓存结构,用于动态错误跟踪和主动弹性增强,以在经济上且实际地实现高误差容差。实验结果表明,在内存控制器中大约58KB区域开销,Chine达到了Chipkill可靠性,没有任何内存容量损失,与基线SEC-DED系统相比,忽略的性能开销。护理提供了一种有吸引力的替代方案,以提高数据中心的可靠性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号