...
首页> 外文期刊>ACM Transactions on Embedded Computing Systems >A Hardware Framework for Yield and Reliability Enhancement in Chip Multiprocessors
【24h】

A Hardware Framework for Yield and Reliability Enhancement in Chip Multiprocessors

机译:芯片多处理器中提高良率和可靠性的硬件框架

获取原文
获取原文并翻译 | 示例
           

摘要

Device reliability and manufacturability have emerged as dominant concerns in end-of-road CMOS devices. An increasing number of hardware failures are attributed to manufacturability or reliability problems. Maintaining an acceptable manufacturing yield for chips containing tens of billions of transistors with wide variations in device parameters has been identified as a great challenge. Additionally, today's nanometer scale devices suffer from accelerated aging effects because of the extreme operating temperature and electric fields they are subjected to. Unless addressed in design, aging-related defects can significantly reduce the lifetime of a product. In this article, we investigate a micro-architectural scheme for improving yield and reliability of homogeneous chip multiprocessors (CMPs). The proposed solution involves a hardware framework that enables us to utilize the redundancies inherent in a multicore system to keep the system operational in the face of partial failures. A micro-architectural modification allows a faulty core in a CMP to use another core's resources to service any instruction that the former cannot execute correctly by itself. This service improves yield and reliability but may cause loss of performance. The target platform for quantitative evaluation of performance under degradation is a dual-core and a quad-core chip multiprocessor with one or more cores sustaining partial failure. Simulation studies indicate that when a large, high-latency, and sparingly used unit such as a floating-point unit fails in a core, correct execution may be sustained through outsourcing with at most a 16% impact on performance for a floating-point intensive application. For applications with moderate floating-point load, the degradation is insignificant. The performance impact may be mitigated even further by judicious selection of the cores to commandeer depending on the current load on each of the candidate cores. The area overhead is also negligible due to resource reuse.
机译:设备可靠性和可制造性已成为道路末端CMOS设备中的主要问题。越来越多的硬件故障归因于可制造性或可靠性问题。对于包含数百亿个晶体管且器件参数有很大差异的芯片,要保持可接受的制造良率是一个巨大的挑战。另外,当今的纳米级器件由于其承受的极端工作温度和电场而遭受加速老化的影响。除非在设计中解决,否则与老化相关的缺陷会大大缩短产品的使用寿命。在本文中,我们研究了一种微体系结构方案,以提高同类芯片多处理器(CMP)的良率和可靠性。提出的解决方案涉及一个硬件框架,该框架使我们能够利用多核系统中固有的冗余来使系统在遇到部分故障时仍可正常运行。微体系结构修改允许CMP中有故障的内核使用另一个内核的资源来服务前者无法自行正确执行的任何指令。此服务可提高产量和可靠性,但可能会导致性能下降。定量评估降级性能的目标平台是双核和四核芯片多处理器,其中一个或多个内核承受部分故障。仿真研究表明,当大型,高延迟且很少使用的单元(例如浮点单元)在内核中发生故障时,可以通过外包来维持正确的执行,而对于浮点密集型应用而言,对性能的影响最多为16%应用。对于中等浮点负载的应用,降级不明显。取决于每个候选核心上的当前负载,可以通过明智地选择要夺取的核心来进一步减轻性能影响。由于资源重用,面积开销也可以忽略不计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号