首页> 外文学位 >Delivering Affordable Fault-tolerance to Commodity Computer Systems.
【24h】

Delivering Affordable Fault-tolerance to Commodity Computer Systems.

机译:为商品计算机系统提供价格合理的容错功能。

获取原文
获取原文并翻译 | 示例

摘要

To meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions. This aggressive scaling has provided designers with an ever increasing budget of cheaper and faster transistors. Unfortunately, this trend has also been accompanied by a decline in individual device reliability as transistors have become increasingly susceptible to a host of threats.;With each new technology generation the challenges associated with process variation, wearout, and transient faults gain greater prominence. We are quickly approaching a new era where fault-tolerance is becoming a first-order design constraint, no longer a luxury reserved exclusively for high-reliability, mission-critical domains. Even commodity microprocessors used in mainstream computing will require protection.;However, just as the reliability needs of NASA and Apple differ dramatically, so does their ability to absorb the costs necessary to ensure fault-tolerance. Viable solutions targeting commodity systems must not only recognize this fact, but must embrace it. Simply stripping down techniques developed for enterprise servers may not result in the most appropriate designs for your laptop or cellphone. The best solutions will exploit the relaxed reliability constraints of commodity systems, judiciously sacrificing a small degree of fault-tolerance to achieve far greater reductions in overhead costs.;This thesis proposes a collection of works that can be selectively mixed and matched to assemble reliability solutions tailor-fit for the commodity systems community. Although the works presented address a variety of different issues from wearout to transient faults and prevention to detection, they were all motivated by the same observation---that much of the overhead costs associated with conventional fault tolerance mechanisms are spent in pursuit of the last few "nines" of reliability. This conclusion gave rise to the philosophy permeating the chapters of this work, that summarily dismissing techniques that cannot supply mission-critical fault tolerance is no longer acceptable. In presenting concrete solutions to a few of the more interesting challenges---proactive wear-leveling orchestrated through intelligent job scheduling and software-only transient fault detection and recovery that exploits intrinsic computational patterns within applications---we establish fundamental principles that can be applied more broadly to formulate a comprehensive reliability strategy.
机译:为了满足消费者对以更低的功耗获得更高性能的不满足需求,硅技术已扩展到前所未有的规模。这种激进的缩放比例为设计人员提供了越来越多的廉价和快速晶体管预算。不幸的是,这种趋势还伴随着单个器件可靠性的下降,因为晶体管变得越来越容易受到多种威胁的影响。随着每一代新技术的出现,与工艺变化,磨损和瞬态故障相关的挑战越来越突出。我们正快速进入一个新时代,在这个时代,容错已成为一阶设计约束,而不再是专为高可靠性,关键任务领域而设计的奢侈品。即使是主流计算中使用的商品微处理器也需要保护。但是,正如NASA和Apple的可靠性需求差异很大一样,它们承担确保容错所需的成本的能力也是如此。针对商品系统的可行解决方案不仅必须认识到这一事实,而且必须接受这一事实。仅仅剥离为企业服务器开发的技术可能不会为您的笔记本电脑或手机提供最合适的设计。最佳解决方案将利用商品系统的宽松可靠性约束,明智地牺牲少量的容错性,以实现更大的开销成本降低。;本文提出了一组可以选择性地混合和匹配以组装可靠性解决方案的作品为商品系统社区量身定制。尽管所展示的作品解决了从磨损到瞬态故障以及从预防到检测的各种不同问题,但它们都是出于同一观察的动机-与常规容错机制相关的许多间接费用都花在了最后可靠性“很少”。这一结论引起了贯穿该工作各章的哲学思想,即不再接受不能提供关键任务容错能力的技术。在提出一些更有趣的挑战的具体解决方案时-通过智能作业调度和利用应用程序内在计算模式的纯软件瞬时故障检测和恢复来进行主动的磨损均衡-我们建立了可以更广泛地应用于制定全面的可靠性策略。

著录项

  • 作者

    Feng, Shuguang.;

  • 作者单位

    University of Michigan.;

  • 授予单位 University of Michigan.;
  • 学科 Engineering Computer.;Engineering Electronics and Electrical.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 172 p.
  • 总页数 172
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号