首页> 外文期刊>Microprocessors and microsystems >Design space exploration of multi-core RTL via high level synthesis from OpenCL models
【24h】

Design space exploration of multi-core RTL via high level synthesis from OpenCL models

机译:通过OpenCL模型的高级综合对多核RTL进行设计空间探索

获取原文
获取原文并翻译 | 示例

摘要

As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of available computing power. Designing optimized accelerators that can meet particular requirements has always presented a tremendous challenge to hardware engineers. To do so, designers have to trade off performance for power consumption in a manner such that the final RTL consumes minimum energy to meet the required performance (e.g. FLOPS) target. Moreover, the growing trend towards heterogeneous platforms is crucial to meet time and power consumption constraints of high-performance computing (HPC) applications. The OpenCL parallel programming language and framework enables programming CPU, GPU and recently FPGAs using the high-level synthesis (HLS) methodology. This work presents a design space exploration flow based on execution time, resource utilization and power consumption of OpenCL kernels mapped on FPGAs using the Xilinx high-level synthesis tool chain. Our experiments suggest that the quality of generated solutions, in terms of performance-per-watt, can be determined using analytical formulas prior to implementation, thus enabling fast and accurate DSE by considering on-chip and off-chip sources of parallelism. Moreover, the automated flow suggests design hints to meet a given time constraint within available resources. The proposed technique is demonstrated by optimizing the well known bitonic sorting network from NVIDIA’s OpenCL benchmark. Our results report that FPGAs have at least 20% higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology (28 nm). Additionally, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming at least 55% less power at the cost of more expensive devices.
机译:随着市场上功能越来越强大的集成电路的出现,越来越多具有不同需求和工作量的应用正在利用可用的计算能力。设计能够满足特定要求的优化加速器一直对硬件工程师提出了巨大的挑战。为此,设计人员必须权衡性能与功耗之间的关系,以使最终的RTL消耗最少的能量以满足所需的性能(例如FLOPS)目标。此外,向异构平台发展的趋势对于满足高性能计算(HPC)应用程序的时间和功耗约束至关重要。 OpenCL并行编程语言和框架允许使用高级综合(HLS)方法对CPU,GPU和最近的FPGA进行编程。这项工作基于使用Xilinx高级综合工具链映射到FPGA上的OpenCL内核的执行时间,资源利用率和功耗,提出了一个设计空间探索流程。我们的实验表明,可以在实施之前使用分析公式确定生成的解决方案的质量(以每瓦性能为单位),从而可以通过考虑片上和片外并行源来实现快速准确的DSE。此外,自动流程会建议设计提示,以在可用资源内满足给定的时间限制。通过优化NVIDIA OpenCL基准中众所周知的双子分类网,可以证明所提出的技术。我们的研究结果表明,相对于使用相同技术(28?nm)制造的两个高端GPU,FPGA的每瓦性能至少高出20%。此外,具有更多可用资源并使用更现代工艺(20?nm)的FPGA可以胜过经过测试的GPU,同时以至少更昂贵的设备为代价,至少消耗至少55%的功耗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号