A Memory Profiling Framework for Stencil Computation on an FPGA Accelerator with High Level Synthesis

Rie Soejima; Koji Okina; Keisuke Dohi; Yuichiro Shibata; Kiyoshi Oguri

首页> 外文期刊>Computer architecture news >A Memory Profiling Framework for Stencil Computation on an FPGA Accelerator with High Level Synthesis

【24h】

A Memory Profiling Framework for Stencil Computation on an FPGA Accelerator with High Level Synthesis

机译：具有高级综合功能的FPGA加速器上用于模板计算的存储器性能分析框架

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a framework to assist memory access optimization for stencil computation on an FPGA accelerator. Since the stencil computations such as scientific simulations need large amounts of data, efficient memory access is a key to achieving high performance on FPGA accelerators. Therefore, we implemented a stencil computation framework with a memory performance profiler on MaxCompiler, which is one of high level synthesis systems. The memory profiler enables us to measure clock cycles for various memory controller states; data transfer, stall, and idle. We also implemented simple stencil computations and practical FDTD electromagnetic field simulations on top of the framework with various parameters to evaluate and analyze memory performance. As a result of execution experiments of the simple stencil computations on a MAX34245A Data Flow Engine, it was demonstrated that approximately 70% of the peak memory performance could be achieved for various stencil types. On the other hand, the FDTD simulations, which need many data streams, could not hit this memory performance saturation point, because of increasing complexity of memory controller modules. Through the analysis of evaluation results obtained by our memory performance profiling framework, a promising memory access optimization approach for stencil computations in which the complexity of the memory controller is traded off against data access traffic is suggested.

机译：在本文中，我们提出了一个框架来协助存储器访问优化，以在FPGA加速器上进行模板计算。由于模板计算（例如科学仿真）需要大量数据，因此有效的存储器访问是在FPGA加速器上实现高性能的关键。因此，我们在MaxCompiler上实现了带有内存性能分析器的模具计算框架，这是高级综合系统之一。内存分析器使我们能够测量各种内存控制器状态的时钟周期；数据传输，停顿和空闲。我们还在框架顶部使用各种参数实施了简单的模板计算和实用的FDTD电磁场仿真，以评估和分析内存性能。在MAX34245A数据流引擎上进行简单模板计算的执行实验的结果表明，对于各种模板类型，峰值存储性能可达到约70％。另一方面，由于存储控制器模块的复杂性增加，需要许多数据流的FDTD仿真无法达到此存储性能饱和点。通过分析我们的内存性能分析框架获得的评估结果，提出了一种有前途的内存访问优化方法，用于模板计算，在该方法中，可以将内存控制器的复杂性与数据访问流量进行权衡。

著录项

来源
《Computer architecture news》 |2014年第4期|69-74|共6页
作者
Rie Soejima; Koji Okina; Keisuke Dohi; Yuichiro Shibata; Kiyoshi Oguri;
展开▼
作者单位

Graduate School of Engineering Nagasaki University, Japan;

Graduate School of Engineering Nagasaki University, Japan;

Graduate School of Engineering Nagasaki University, Japan;

Graduate School of Engineering Nagasaki University, Japan;

Graduate School of Engineering Nagasaki University, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth [J] . Sano K., Hatsuda Y., Yamamoto S. IEEE Transactions on Parallel and Distributed Systems . 2014,第3期

机译：具有恒定存储器带宽的可扩展模板计算的多FPGA加速器
2. Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization [J] . Koji Okina, Rie Soejima, Kota Fukumoto, Computer architecture news . 2015,第4期

机译：在FPGA加速器上进行3-D模板计算的电源性能分析，可进行有效的管线优化
3. PACC: a directive-based programming framework for out-of-core stencil computation on accelerators [J] . Nobuhiro Miki, Fumihiko Ino, Kenichi Hagihara International Journal of High Performance Computing and Networking . 2019,第1期

机译：PACC：基于指令的加速器上的核心模板计算的指令编程框架
4. Towards a Low-Power Accelerator of Many FPGAs for Stencil Computations [C] . Kobayashi Ryohei, Takamaeda-Yamazaki Shinya, Kise Kenji 2012 Third International Conference on Networking and Computing. . 2012

机译：向许多用于模版计算的FPGA的低功耗加速器迈进
5. Implementation of Long Short-term Memory Neural Networks in High-level Synthesis Targeting FPGAs [D] . Rao, Richa. 2020

机译：在靶向FPGA的高级合成中长短期记忆神经网络的实施
6. Families of FPGA-Based Accelerators for Approximate String Matching [O] . Tom Van Court, Martin C. Herbordt -1

机译：基于FPGA的加速器家族用于近似字符串匹配
7. Multi-FPGA Accelerator Architecture for Stencil Computation Exploiting Spacial and Temporal Scalability [O] . Hasitha Muthumala Waidyasooriya, Masanori Hariyama 2019

机译：用于模板计算的多FPGA加速器架构利用空间和时间可伸缩性

A Memory Profiling Framework for Stencil Computation on an FPGA Accelerator with High Level Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅