首页> 外文期刊>Concurrency and computation: practice and experience >Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models
【24h】

Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models

机译:自动生成的用于生物医学模型的FPGA加速器的性能优化策略

获取原文
获取原文并翻译 | 示例
           

摘要

Biomedical modelling that is mathematically described by ordinary differential equations (ODEs) is often one of the most computationally intensive parts of simulations. With high inherent parallelism, hardware acceleration based on field programmable gate array has great potential to increase the computational performance of the ODE model integration while being very power efficient. ODE-based Domain-specific Synthesis Tool is a tool we proposed previously to automatically generate the complete hardware/software co-design framework for computing biomedical models based on CellML. Although it provides remarkable performance improvement and high energy efficiency compared with CPUs and GPUs, there is still a great potential for optimisation. In this paper, we investigate a set of optimisation strategies including compiler optimisation, resource fitting and balancing, and multiple pipelines. They all have in common that they can be performed automatically and hence can be integrated in our domain-specific high level synthesis tool. We evaluate the optimised hardware accelerator modules generated by ODE-based Domain-specific Synthesis Tool on real hardware based on their resource usage, processing speed and power consumption. The results are compared with single threaded and multi-core CPUs with/without Streaming SIMD Extension (SSE) optimisation and a graphics card. The results show that the proposed optimisation strategies provide significant performance improvement and result in even more energy-efficient hardware accelerator modules. Furthermore, the resources of the target field programmable gate array device can be more efficiently utilised in order to fit larger biomedical models than before. Copyright © 2015 John Wiley & Sons, Ltd.
机译:由常微分方程(ODE)在数学上描述的生物医学建模通常是模拟中计算量最大的部分之一。具有很高的固有并行度,基于现场可编程门阵列的硬件加速在提高ODE模型集成的计算性能的同时,还具有非常高的功率效率。基于ODE的领域特定综合工具是我们先前提出的一种工具,用于自动生成用于计算基于CellML的生物医学模型的完整硬件/软件协同设计框架。尽管与CPU和GPU相比,它提供了显着的性能改进和高能效,但仍有很大的优化潜力。在本文中,我们研究了一组优化策略,包括编译器优化,资源拟合和平衡以及多个管道。它们的共同点是它们可以自动执行,因此可以集成到我们特定领域的高级综合工具中。我们基于资源的使用,处理速度和功耗,评估了基于ODE的领域特定综合工具在实际硬件上生成的优化硬件加速器模块。将结果与具有/不具有流SIMD扩展(SSE)优化和图形卡的单线程和多核CPU进行比较。结果表明,所提出的优化策略可显着提高性能,并产生更加节能的硬件加速器模块。此外,可以比以前更有效地利用目标现场可编程门阵列设备的资源,以适应更大的生物医学模型。版权所有©2015 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号