【24h】

Soft Errors: The Hardware-Software Interface

机译:软错误:硬件软件界面

获取原文

摘要

A recent report from the ITRS identifies soft errors, as one of the most important reliability challenges for the coming decades. Soft errors are transient errors caused by several effects e.g., voltage fluctuations, wire-cross talks, and cosmic particle strikes; and manifest as a temporary switch of the logic value of a transistor. While it is not possible to prove nor disprove that a certain error happened due to soft errors, several fiscal disasters e.g., Sun server crashes in 2000, and HP server crashes in 2005, have been attributed to soft errors. Industry has moved from the position of ignoring soft errors to adding design efforts for protection from them. For instance, in the recently announced nVIDIA's Fermi GPUs, the L1 cache, L2 cache and register files are ECC protected. Although the soft error rate is about once-per year today, it is expected to reach alarming levels of once-per-day in about a decade or two. Researchers are busy finding cost-effective solutions to protect computing devices from soft errors. This tutorial will attempt to cover the entire gamut of soft error protection techniques, but will particularly focus on the soft error mitigation techniques at the hardware/software interface. Much time will be spent on microarchitectural, compiler, and hybrid compiler-microarchitectural techniques for soft error mitigation. This tutorial will be particularly useful for budding researchers who are fascinated by soft errors, and want to explore this as their research direction. For such researchers, this tutorial will be a one-stop-shop to acquire knowledge of and analyze seminal research work in the field of soft error mitigation, at several design layers. For developers who have been working on soft errors at different levels, this will give them a picture of what can be done at other levels, so that they can provide complementary cross-layer protection. Finally, researchers and developers working on other aspects of system design can learn how soft errors are going to affect them.
机译:最近ITRS的报告标识了软错误,作为未来几十年的最重要的可靠性挑战之一。软误差是由几种效果导致的瞬态误差,例如,电压波动,电线交叉谈话和宇宙粒子撞击;并作为晶体管逻辑值的临时切换。虽然不可能证明或反驳,但由于软错误导致的某种错误,若干财政灾害,2000年的Sun Server崩溃,2005年的HP Server崩溃,归因于软错误。行业已从忽略软错误的位置,为增加他们的保护。例如,在最近宣布的NVIDIA的FERMI GPU中,L1缓存,L2缓存和寄存器文件受到ECC保护。虽然今天柔软的错误率约为每年一次,但预计大约十年或两个人的每天一次的惊人率。研究人员正忙于找到经济高效的解决方案来保护计算设备免受软错误。本教程将尝试涵盖整个软误差保护技术的色域,但将特别关注硬件/软件界面处的软错误缓解技术。将花在微体系结构,编译器和混合编译 - 微体建筑技术上花在微量架构,用于软误差缓解。本教程对崭露头角的研究人员特别有用,令人着迷的研究人员,并希望探索这方面的研究方向。对于此类研究人员来说,本教程将是一家一站式商店,以获得多个设计层,在软错误缓解领域中获取和分析开创性研究工作。对于在不同级别工作软错误的开发人员来说,这将为他们提供可以在其他级别完成的图片,以便他们可以提供互补的跨层保护。最后,研究系统设计其他方面的研究人员和开发人员可以了解软错误如何影响它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号