Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models

Yu Ting; Oppermann Julian; Bradley Chris; Sinnen Oliver

首页> 外文期刊>Concurrency and computation: practice and experience >Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models

【24h】

Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models

机译：自动生成的用于生物医学模型的FPGA加速器的性能优化策略

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Biomedical modelling that is mathematically described by ordinary differential equations (ODEs) is often one of the most computationally intensive parts of simulations. With high inherent parallelism, hardware acceleration based on field programmable gate array has great potential to increase the computational performance of the ODE model integration while being very power efficient. ODE-based Domain-specific Synthesis Tool is a tool we proposed previously to automatically generate the complete hardware/software co-design framework for computing biomedical models based on CellML. Although it provides remarkable performance improvement and high energy efficiency compared with CPUs and GPUs, there is still a great potential for optimisation. In this paper, we investigate a set of optimisation strategies including compiler optimisation, resource fitting and balancing, and multiple pipelines. They all have in common that they can be performed automatically and hence can be integrated in our domain-specific high level synthesis tool. We evaluate the optimised hardware accelerator modules generated by ODE-based Domain-specific Synthesis Tool on real hardware based on their resource usage, processing speed and power consumption. The results are compared with single threaded and multi-core CPUs with/without Streaming SIMD Extension (SSE) optimisation and a graphics card. The results show that the proposed optimisation strategies provide significant performance improvement and result in even more energy-efficient hardware accelerator modules. Furthermore, the resources of the target field programmable gate array device can be more efficiently utilised in order to fit larger biomedical models than before. Copyright © 2015 John Wiley & Sons, Ltd.

机译：由常微分方程（ODE）在数学上描述的生物医学建模通常是模拟中计算量最大的部分之一。具有很高的固有并行度，基于现场可编程门阵列的硬件加速在提高ODE模型集成的计算性能的同时，还具有非常高的功率效率。基于ODE的领域特定综合工具是我们先前提出的一种工具，用于自动生成用于计算基于CellML的生物医学模型的完整硬件/软件协同设计框架。尽管与CPU和GPU相比，它提供了显着的性能改进和高能效，但仍有很大的优化潜力。在本文中，我们研究了一组优化策略，包括编译器优化，资源拟合和平衡以及多个管道。它们的共同点是它们可以自动执行，因此可以集成到我们特定领域的高级综合工具中。我们基于资源的使用，处理速度和功耗，评估了基于ODE的领域特定综合工具在实际硬件上生成的优化硬件加速器模块。将结果与具有/不具有流SIMD扩展（SSE）优化和图形卡的单线程和多核CPU进行比较。结果表明，所提出的优化策略可显着提高性能，并产生更加节能的硬件加速器模块。此外，可以比以前更有效地利用目标现场可编程门阵列设备的资源，以适应更大的生物医学模型。版权所有©2015 John Wiley＆Sons，Ltd.

著录项

来源
《Concurrency and computation: practice and experience》 |2016年第5期|1480-1506|共27页
作者
Yu Ting; Oppermann Julian; Bradley Chris; Sinnen Oliver;
展开▼
作者单位

University of Auckland Department of Electrical and Computer Engineering Auckland New Zealand;

University of Auckland Auckland Bioengineering Institute Auckland New Zealand;

Technische Universität Darmstadt Embedded Systems and Applications Group Darmstadt Germany;

University of Auckland Auckland Bioengineering Institute Auckland New Zealand;

University of Auckland Department of Electrical and Computer Engineering Auckland New Zealand;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
optimisations strategies; high‐performance computing; domain‐specific high‐level synthesis;

机译：优化策略;高性能计算;领域特定的高级综合;

相似文献

外文文献
中文文献
专利

1. Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators [J] . Ma Yufei, Cao Yu, Vrudhula Sarma, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第2期

机译：将各种CNN的自动编译在高性能FPGA加速器上
2. Automatic parallelisation for LTI MIMO state space systems using FPGAs. An optimisation for cost & performance [J] . B.Apopei, TJ.Dodd Journal of Parallel and Distributed Computing . 2012,第8期

机译：使用FPGA的LTI MIMO状态空间系统的自动并行化。成本和性能的优化
3. Performance Modeling for CNN Inference Accelerators on FPGA [J] . Ma Yufei, Cao Yu, Vrudhula Sarma, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第4期

机译：FPGA上CNN推理加速器的性能建模
4. FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators [C] . Bellas N., Chai S.M., Dwyer M., IEEE International Parallel and Distributed Processing Symposium . 2006

机译：FPGA使用自动生成的流式加速器执行牌照识别SOC
5. A Hybrid Partially Reconfigurable Overlay Supporting Just-In-Time Assembly of Custom Accelerators on FPGAs. [D] . Aklah, Zeyad Tariq. 2017

机译：混合的部分可重新配置的叠加层，可在FPGA上即时组装定制加速器。
6. BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features [O] . Richard Tzong-Han Tsai, Wen-Chi Chou, Ying-Shan Su, 2007

机译：BIOSMILE：生物医学动词的语义角色标记系统使用具有最大熵模型和自动生成的模板特征的生物医学动词
7. Performance Modeling of Stencil Computing on a Stream-Based FPGA Accelerator for Efficient Design Space Exploration [O] . Keisuke DOHI, Koji OKINA, Rie SOEJIMA, 2015

机译：基于流式FPGA加速器模板计算的性能建模，用于高效设计空间探索

Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅