首页> 外文学位 >Design of Energy-Efficient Many-Core MIMD GALS Processor Arrays in the 1000-Processor Era.
【24h】

Design of Energy-Efficient Many-Core MIMD GALS Processor Arrays in the 1000-Processor Era.

机译:1000处理器时代的节能型多核MIMD GALS处理器阵列设计。

获取原文
获取原文并翻译 | 示例

摘要

As transistor sizes continue to scale, more transistors are able to be used in a fixed die size. The recent trend for general purpose processing units is to use the increased number of transistors from process technology scaling to add more processing cores on a single die. At a certain point, it becomes untenable to continue to add more cores with traditional architectures and communication systems, which necessitates a fundamental change in architectures to facilitate these cores. This paradigm shift requires new energy- efficient, high-performance algorithms and hardware designs tailored for many-core processor arrays, as they provide different challenges than a single or multi-core chip. With such large arrays of processors, communication, both between processors and to memories, becomes a limiting factor, requiring algorithms to work with these limitations as well as on-chip interconnect networks to make communication possible.;This dissertation offers three different novel methods to perform a high throughput energy-efficient database sort data records using a fine-grained many-core processor array. When measured against sorts created to fairly compare results, the most energy efficient first-phase many-core sort requires over 83x lower energy than a quick sort performed on an Intel laptop-class processor and over 105x lower energy than a radix sort running on an Nvidia GPU. In addition, the highest first-phase throughput many-core sort is over 10x faster than the quick sort and over 14x faster than the radix sort. Both phases of an entire 10 GB external sort require 6.9x lower energyxtime (energy delay product, EDP) than the quick sort and over 13x lower energyxtime than the radix sort. The proposed sorts are easily programmed and scalable to any sized 2D mesh processor array while giving a large energy savings without penalizing performance.;The dissertation presents the developed physical design flow and design methodology for creating a digital chip in the 1000-processor era. A number of design considerations are discussed, including module design, power grid design, power gate system design, physical DVFS requirements, communication, and chip level layout.;The design for both KiloCore and KiloCore2 are covered, as well as preliminary measured results from KiloCore, the first fabricated chip containing 1000 MIMD, programmable, independent processing cores on a single die. Early results show that KiloCore can perform at 5.8 pJ/Op at 115 Billion Ops/sec at 0.56 V, and up to 1.78 Trillion Ops/sec at 1.1 V. KiloCore2 contains 697 programable processors, two of which are optimized for high speed, one fast Fourier transform accelerator, and two Viterbi decoder accelerators. Both chips were fabricated in 32 nm partially depleted silicon-on-insulator (PD-SOI) technology. KiloCore2 contains multiple power rails, which allows individual cores to select a voltage based on its workload to save energy, with minimal voltage droop and minimal area.
机译:随着晶体管尺寸的不断扩大,可以在固定管芯尺寸中使用更多的晶体管。通用处理单元的最新趋势是使用来自工艺技术扩展的数量增加的晶体管,以在单个管芯上增加更多的处理内核。在某一时刻,继续在传统体系结构和通信系统中添加更多的内核变得站不住脚,这需要对体系结构进行根本性的改变以促进这些内核。这种范式转变需要针对多核处理器阵列量身定制的新型节能,高性能算法和硬件设计,因为它们与单核或多核芯片相比面临着不同的挑战。在如此大的处理器阵列的情况下,处理器之间以及存储器之间的通信成为一个限制因素,要求算法必须克服这些限制以及片上互连网络才能使通信成为可能。使用细粒度的多核处理器阵列执行高吞吐量,高能效的数据库排序数据记录。根据为公平比较结果而创建的排序进行衡量时,最节能的第一阶段多核排序所需要的能耗比在英特尔笔记本电脑级处理器上执行的快速排序要低83倍以上,而运行在基数上的基数排序所需要的能耗低105倍以上。 Nvidia GPU。此外,最高的第一阶段吞吐量多核排序比快速排序快10倍以上,比基数排序快14倍以上。整个10 GB外部排序的两个阶段所需的energyxtime(能量延迟乘积,EDP)比快速排序低6.9倍,而energyxtime则比基数排序低13倍以上。所提出的种类易于编程,并且可扩展到任何尺寸的2D网格处理器阵列,同时节省大量能源而不会降低性能。论文提出了在1000个处理器时代创建数字芯片的物理设计流程和设计方法。讨论了许多设计注意事项,包括模块设计,电网设计,功率门系统设计,物理DVFS要求,通信和芯片级布局。;涵盖了KiloCore和KiloCore2的设计,以及从中获得的初步测量结果KiloCore,第一个制造的芯片,在单个芯片上包含1000个MIMD,可编程的独立处理内核。早期结果显示,KiloCore在0.56 V下可在1100 pS / s时以5.8 pJ / Op的速度运行,在1.1 V下可达到1.78万亿Ops / s。KiloCore2包含697个可编程处理器,其中两个针对高速进行了优化,其中一个快速傅里叶变换加速器和两个维特比解码器加速器。两种芯片均采用32 nm部分耗尽型绝缘体上硅(PD-SOI)技术制造。 KiloCore2包含多个电源轨,允许各个内核根据其工作负载选择电压以节省能源,并且电压降最小且面积最小。

著录项

  • 作者

    Stillmaker, Aaron Thomas.;

  • 作者单位

    University of California, Davis.;

  • 授予单位 University of California, Davis.;
  • 学科 Computer engineering.;Electrical engineering.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 144 p.
  • 总页数 144
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号