首页> 外文期刊>Computer architecture news >Domain-Specific Programmable Design of Scalable Streaming-Array for Power-Efficient Stencil Computation
【24h】

Domain-Specific Programmable Design of Scalable Streaming-Array for Power-Efficient Stencil Computation

机译:高效模板计算的可扩展流阵列的领域特定可编程设计

获取原文
获取原文并翻译 | 示例

摘要

This paper presents the domain-specific programmable design of custom computing machines for high-performance stencil computation. Stencil computation is one of the typical kernels in scientific computations, however its low operational-intensity makes the sustained performance limited by memory bandwidth on recent microprocessors and GPUs. So far we have proposed a scalable streaming-array (SS A) of processing elements, which provides almost linear scalability by increasing FPGAs with a constant external-memory bandwidth. In order to facilitate custom computing and efficiently utilize hardware resources for various and complex stencil-computations, we design programmable SSA with limited but necessary functionality. We show the design concept, the programmable structure and the SIMD instruction set for SSA. Prototype implementation with nine FPGAs demonstrates that our programmable design with a lot of floating-point units exploits hardware resources well, efficiently achieving 260 GFlop/s, which is 87.4 % of the peak,at1295MFlop/sW.
机译:本文介绍了用于高性能模具计算的定制计算机的领域特定的可编程设计。模板计算是科学计算中的典型内核之一,但是其低的运算强度使持续的性能受到最近微处理器和GPU上内存带宽的限制。到目前为止,我们已经提出了一种可扩展的处理单元流媒体阵列(SS A),它可以通过增加具有恒定外部存储器带宽的FPGA来提供几乎线性的可扩展性。为了促进自定义计算并有效地利用硬件资源进行各种复杂的模板计算,我们设计了功能有限但必需的可编程SSA。我们展示了SSA的设计概念,可编程结构和SIMD指令集。使用9个FPGA进行原型实现表明,我们的带有许多浮点单元的可编程设计可以充分利用硬件资源,有效地达到260 GFlop / s,这是峰值1295MFlop / sW的87.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号