首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Reducing the Performance Gap between Soft Scalar CPUs and Custom Hardware with TILT
【24h】

Reducing the Performance Gap between Soft Scalar CPUs and Custom Hardware with TILT

机译:使用TILT缩小软标量CPU与定制硬件之间的性能差距

获取原文
获取原文并翻译 | 示例

摘要

By using resource sharing field-programmable gate array (FPGA) compute engines, we can reduce the performance gap between soft scalar CPUs and resource-intensive custom datapath designs. This article demonstrates that Thread-and Instruction-Level parallel Template architecture (TILT), a programmable FPGA-based horizontally microcoded compute engine designed to highly utilize floating point (FP) functional units (FUs), can improve significantly the average throughput of eight FP-intensive applications compared to a soft scalar CPU (similar to a FP-extended Nios). For eight benchmark applications, we show that: (i) a base TILT configuration having a single instance for each FU type can improve the performance over a soft scalar CPU by 15.8x, while requiring on average 26% of the custom datapaths' area; (ii) selectively increasing the number of FUs canmore than double TILT's average throughput, reducing the custom-datapath-throughputgap from 576x to 14x; and (iii) replicated instances of the most computationally dense TILT configuration that fit within the area of each custom datapath design can reduce the gap to 8.27x, while replicated instances of application-tuned configurations of TILT can reduce the custom-datapath-throughput-gap to an average of 5.22x, and up to 3.41x for the Matrix Multiply benchmark. Last, we present methods for design space reduction, and we correctly predict the computationally densest design for seven out of eight benchmarks.
机译:通过使用资源共享的现场可编程门阵列(FPGA)计算引擎,我们可以缩小软标量CPU与资源密集型自定义数据路径设计之间的性能差距。本文证明,线程和指令级并行模板架构(TILT)是一种基于FPGA的可编程水平微编码计算引擎,旨在高度利用浮点(FP)功能单元(FU),可以显着提高八个FP的平均吞吐量。与软标量CPU(类似于FP扩展的Nios)相比,应用程序密集型。对于八个基准应用程序,我们表明:(i)对于每种FU类型具有单个实例的基本TILT配置,可以将软标量CPU的性能提高15.8倍,同时平均需要自定义数据路径面积的26%; (ii)有选择地增加FU的数量,可以使TILT的平均吞吐量增加一倍以上,从而将自定义数据路径吞吐量的差距从576x减少到14x; (iii)计算复杂度最高的TILT配置的复制实例可以适合每个自定义数据路径设计的区域,可以将差距减小到8.27倍,而TILT的应用程序优化配置的复制实例可以减少自定义数据路径吞吐量-平均差距为5.22倍,对于Matrix Multiply基准,差距最大为3.41倍。最后,我们介绍了减少设计空间的方法,并且针对八项基准测试中的七项,我们正确地预测了计算密度最高的设计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号