Accelerator-rich architectures — from single-chip to datacenters

机译：加速器丰富的架构 - 从单芯片到数据中心

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to drastically improve energy efficiency, we believe that future processor architectures will make extensive use of accelerators from single-chip implementation to datacenter-level integration, as custom-designed accelerators often provide 10-1000X performance/energy efficiency over the general-purpose processors [1]. Such an accelerator-rich architecture presents a fundamental departure from the classical von Neumann architecture, which emphasizes efficient sharing of the executions of different instructions on a common pipeline, providing an elegant solution when the computing resource is scarce. In constrast, the accelerator-rich architecture features heterogeneity and customizaiton for energy efficiency, which is better suited for energy-constrained design where the silicon resource is abundant. There are several concerns with the extensive usage of accelerators: (1) low utilization, (2) narrow workload coverage, (3) high design cost, and (4) unfamiliar programming interfaces. In this talk, I shall discuss recent progresses and ongoing work to address these concerns. Due to tight power and thermal budgets, only a fraction of computing elements on-chip can be active in future technologies (so called dark silicon [2]). This means low utilization (but much higher energy efficiency) will be an inherent characteristic of future chips. To address the problem of narrow workload coverage, we look to the use of composable accelerators and programmable fabrics to virtualize and accelerate larger blocks of computation [3]. The design cost can properly managed by leveraing the recent advances in high-level synthesis coupled with efficient parameterized architecture template generation. The programming interface is a critical issue for successful adaption of accelerator-rich architectures. It needs to support extensive use of accelerators from single-chip to datacenter scales [4]. We have made significant progress in compilation and runtime support to enable progra- mers to make use the existing programming interfaces (e.g. C/C++ for computation tasks and MapReduce or Hadoop for large-scale distributed computation in dataceners) for efficient use of accelerators at all scales.

机译：为了大大提高能源效率，我们认为未来的处理器架构将广泛使用从单芯片实现到数据中心级集成的加速器，因为定制设计的加速器通常通过通用目的提供10-1000x性能/能源效率处理器[1]。这种加速器的富有架构从古典von neumann架构中展示了一个根本的偏离，这强调了在公共管道上的不同指令执行的有效共享，当计算资源稀缺时提供优雅的解决方案。在约束中，富有加速器的架构具有异质性和Customizaiton，以获得能效，这更适合能量受限设计，其中硅资源丰富。随着加速器的广泛使用有几个问题：（1）低利用率，（2）窄工作量覆盖，（3）高设计成本，（4）不熟悉的编程接口。在这次谈话中，我将讨论最近的进展和持续的工作来解决这些问题。由于电力和热预算紧凑，芯片片上的一小部分可以在未来的技术中有效（所谓的暗硅[2]）。这意味着利用率低（但能效率更高）将是未来芯片的固有特征。为了解决狭窄的工作量覆盖问题，我们希望使用可组合的加速器和可编程面料来虚拟化和加速更大的计算块[3]。设计成本可以通过利用高级合成的最新进步与高效参数化建筑模板生成进行适当管理。编程界面是成功适应加速器的富有架构的关键问题。它需要支持从单芯片到数据中心尺度的广泛使用加速器[4]。我们在编译和运行时支持方面取得了重大进展，以使Progra-MERS能够使用现有的编程接口（例如C / C ++用于计算任务以及MapReduce或Hadoop进行Dataceners中的大规模分布式计算），以便有效地使用加速器秤。

著录项

来源
《IEEE/ACM International Symposium on Low Power Electronics and Design》|2014年||共1页
会议地点
作者
Cong Jason;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类真空电子技术;
关键词
Energy-efficient computing; accelerators;

机译：节能计算;加速器;

相似文献

外文文献
中文文献
专利

1. Architecture of a Single-Chip 50 Gb/s DP-QPSK/BPSK Transceiver With Electronic Dispersion Compensation for Coherent Optical Channels [J] . IEEE transactions on circuits and systems . I , Regular papers . 2014,第4期

机译：具有相干光通道电子色散补偿的单芯片50 Gb / s DP-QPSK / BPSK收发器的架构
2. Programming many-core architectures -a case study: dense matrix computations on the Intel single-chip cloud computer processor [J] . J. P. E. Hodgson Computing reviews . 2013,第2期

机译：编程多核体系结构-案例研究：英特尔单芯片云计算机处理器上的密集矩阵计算
3. Programming many-core architectures - a case study: dense matrix computations on the Intel single-chip cloud computer processor [J] . Bryan Marker, Ernie Chan, Jack Poulson, Concurrency and computation: practice and experience . 2012,第12期

机译：对多核架构进行编程-案例研究：英特尔单芯片云计算机处理器上的密集矩阵计算
4. Accelerator-rich architectures — from single-chip to datacenters [C] . Cong Jason IEEE/ACM International Symposium on Low Power Electronics and Design . 2014

机译：加速器丰富的架构 - 从单芯片到数据中心
5. Memory System Optimizations for Customized Computing -- From Single-Chip to Datacenter. [D] . Chen, Yu-Ting. 2016

机译：用于定制计算的内存系统优化-从单芯片到数据中心。
6. SDTCP: Towards Datacenter TCP Congestion Control with SDN for IoT Applications [O] . Yifei Lu, Zhen Ling, Shuhong Zhu, 2017

机译：SDTCP：使用SDN实现IoT数据中心TCP拥塞控制
7. On-chip interconnection network for accelerator-rich architectures [O] . Jason Cong, Michael Gill, Yuchen Hao, 2015

机译：加速器 - 丰富的架构的片上互连网络

Accelerator-rich architectures — from single-chip to datacenters

摘要

著录项

相似文献

相关主题

期刊订阅