首页> 外文期刊>Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on >ERSA: Error Resilient System Architecture for Probabilistic Applications
【24h】

ERSA: Error Resilient System Architecture for Probabilistic Applications

机译:ERSA:针对概率应用程序的错误恢复系统架构

获取原文
获取原文并翻译 | 示例

摘要

There is a growing concern about the increasing vulnerability of future computing systems to errors in the underlying hardware. Traditional redundancy techniques are expensive for designing energy-efficient systems that are resilient to high error rates. We present ${bf underline{E}rror~underline{R}esilient~underline{S}ystem~underline{A}rchitecture}$ (ERSA), a robust system architecture which targets emerging killer applications such as recognition, mining, and synthesis (RMS) with inherent error resilience, and ensures high degrees of resilience at low cost. Using the concept of configurable reliability, ERSA may also be adapted for general-purpose applications that are less resilient to errors (but at higher costs). While resilience of RMS applications to errors in low-order bits of data is well-known, execution of such applications on error-prone hardware significantly degrades output quality (due to high-order bit errors and crashes). ERSA achieves high error resilience to high-order bit errors and control flow errors (in addition to low-order bit errors) using a judicious combination of the following key ideas: 1) asymmetric reliability in many-core architectures; 2) error-resilient algorithms at the core of probabilistic applications; and 3) intelligent software optimizations. Error injection experiments on a multicore ERSA hardware prototype demonstrate that, even at very high error rates of 20 errors/flip-flop/$10^{8}$ cycles (equivalent to 25000 errors/core/s), ERSA maintains 90% or better accuracy of output results, together with minimal impact on execution time, for probabilistic applications such as K-Means clustering, LDPC decoding, and Bayesian network inference. In addition, we demonstrate the effectiveness of ERSA in tolerating high rates of static memory errors that are characteristic of emerging challenges relate-n to SRAM ${V}_{rm ccmin}$ problems and erratic bit errors.
机译:人们越来越担心未来的计算系统越来越容易受到底层硬件错误的影响。传统的冗余技术对于设计对高错误率具有弹性的节能系统而言非常昂贵。我们介绍了$ {bf下划线{E} rror〜下划线{R} esilient〜下划线{S} ystem〜下划线{A}架构} $(ERSA),这是一种针对新兴杀手级应用程序的强大系统体系结构,例如识别,挖掘和综合(RMS)具有固有的错误恢复能力,并以低成本确保了高度的恢复能力。使用可配置可靠性的概念,ERSA还可以适用于对错误的弹性较小(但成本较高)的通用应用程序。虽然众所周知RMS应用程序对低位数据错误有一定的恢复能力,但在容易出错的硬件上执行此类应用程序会显着降低输出质量(由于高位错误和崩溃)。 ERSA通过以下关键思想的明智组合,实现了对高阶位错误和控制流错误(除了低阶位错误)的高错误恢复能力:1)多核架构中的非对称可靠性; 2)容错应用程序的核心是错误恢复算法;和3)智能软件优化。在多核ERSA硬件原型上进行的错误注入实验表明,即使在20个错误/触发器/ $ 10 ^ {8} $个周期的极高错误率(相当于25000个错误/核/ s)下,ERSA仍可保持90%或更高对于概率应用(例如K均值聚类,LDPC解码和贝叶斯网络推断),输出结果的准确性以及对执行时间的影响最小。此外,我们证明了ERSA在容忍高静态存储器错误率方面的有效性,这些错误率是与SRAM $ {V} _ {rm ccmin} $问题和不稳定的位错误有关的新兴挑战的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号