【24h】

Soft Errors: The Hardware-Software Interface

机译:软错误:硬件-软件接口

获取原文
获取原文并翻译 | 示例

摘要

A recent report from the ITRS identifies soft errors, as one of the most important reliability challenges for the coming decades. Soft errors are transient errors caused by several effects e.g., voltage fluctuations, wire-cross talks, and cosmic particle strikes; and manifest as a temporary switch of the logic value of a transistor. While it is not possible to prove nor disprove that a certain error happened due to soft errors, several fiscal disasters e.g., Sun server crashes in 2000, and HP server crashes in 2005, have been attributed to soft errors. Industry has moved from the position of ignoring soft errors to adding design efforts for protection from them. For instance, in the recently announced nVIDIA's Fermi GPUs, the L1 cache, L2 cache and register files are ECC protected. Although the soft error rate is about once-per year today, it is expected to reach alarming levels of once-per-day in about a decade or two. Researchers are busy finding cost-effective solutions to protect computing devices from soft errors. This tutorial will attempt to cover the entire gamut of soft error protection techniques, but will particularly focus on the soft error mitigation techniques at the hardware/software interface. Much time will be spent on microarchitectural, compiler, and hybrid compiler-microarchitectural techniques for soft error mitigation. This tutorial will be particularly useful for budding researchers who are fascinated by soft errors, and want to explore this as their research direction. For such researchers, this tutorial will be a one-stop-shop to acquire knowledge of and analyze seminal research work in the field of soft error mitigation, at several design layers. For developers who have been working on soft errors at different levels, this will give them a picture of what can be done at other levels, so that they can provide complementary cross-layer protection. Finally, researchers and developers working on other aspects of system design can learn how soft errors are going to affect them.
机译:ITRS的最新报告指出,软错误是未来几十年最重要的可靠性挑战之一。软错误是由多种影响(例如电压波动,导线串扰和宇宙粒子撞击)引起的瞬态错误;并表现为晶体管逻辑值的临时切换。虽然无法证明或反驳某些错误是由于软错误而发生的,但一些财务灾难(例如2000年的Sun服务器崩溃和2005年的HP服务器崩溃)都归因于软错误。工业已经从忽略软错误的位置转变为增加设计力度以防止受到软错误的侵害。例如,在最近发布的nVIDIA的Fermi GPU中,L1缓存,L2缓存和寄存器文件受ECC保护。尽管今天的软错误率大约为每年一次,但预计在大约一两年之内将达到每天一次的惊人水平。研究人员正忙于寻找经济有效的解决方案来保护计算设备免受软错误的侵害。本教程将尝试涵盖软错误保护技术的全部范围,但将特别着重于硬件/软件接口上的软错误缓解技术。用于软错误缓解的微体系结构,编译器和混合编译器-微体系结构技术将花费大量时间。本教程对于那些对软错误着迷并希望以此作为研究方向的新兴研究人员特别有用。对于这类研究人员而言,本教程将是一站式服务,可在几个设计层上获得有关软错误缓解领域的开创性研究工作的知识并进行分析。对于一直在不同级别上处理软错误的开发人员,这将使他们了解在其他级别上可以做什么,从而他们可以提供互补的跨层保护。最后,从事系统设计其他方面的研究人员和开发人员可以了解软错误将如何影响它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号