Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse

Jincheng Su; Fan Yang; Xuan Zeng; Dian Zhou; Jie Chen

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse

【24h】

Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse

机译：通过数据重用实现FPGA中并行数据访问的有效内存分区

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Parallelizing the memory accesses in a nested loop is a critical challenge to facilitate loop pipelining. An effective approach for high-level synthesis on field-programmable gate array is to map these accesses to multiple on-chip memory banks using a memory partitioning technique. In this paper, we propose an efficient memory partitioning algorithm with low overhead and low time complexity for parallel data access via data reuse. We find that for most applications in image and video processing, a large amount of data can be reused among different iterations of a loop nest. Motivated by this observation, we propose to cache reusable data using on-chip registers, organized as register chains. The nonreusable data are then separated into several memory banks by a memory partitioning algorithm. We revise the existing padding method to cover cases occurring frequently in our method wherein certain components of partition vector are zeros. Experimental results have demonstrated that compared with the state-of-the-art algorithms, the proposed method is efficient in terms of execution time, resource overhead, and power consumption across a wide range of access patterns extracted from applications in image and video processing. As for the testing patterns, the execution time is typically less than one millisecond. And the number of required memory banks is reduced by 59.7% on average, which leads to an average reduction of 78.2% in look-up tables, 65.5% in flip-flops, 37.1% in DSP48Es, and therefore 74.8% reduction in dynamic power consumption. Moreover, the storage overhead incurred by the proposed method is zero for most widely used access patterns in image filtering.

机译：在嵌套循环中并行化内存访问是促进循环流水线化的关键挑战。在现场可编程门阵列上进行高级综合的有效方法是使用存储器分区技术将这些访问映射到多个片上存储器组。在本文中，我们提出了一种高效的内存划分算法，该算法具有低开销和低时间复杂度，可用于通过数据重用进行并行数据访问。我们发现，对于图像和视频处理中的大多数应用程序，可以在循环嵌套的不同迭代之间重用大量数据。基于这种观察，我们建议使用以寄存器链组织的片上寄存器来缓存可重复使用的数据。然后，不可重复使用的数据通过内存分区算法分成几个存储库。我们修改了现有的填充方法，以涵盖分区方法中某些部分为零的频繁发生的情况。实验结果表明，与最新算法相比，该方法在从图像和视频处理应用程序中提取的各种访问模式上的执行时间，资源开销和功耗方面都非常有效。至于测试模式，执行时间通常少于一毫秒。并且所需的存储体数量平均减少了59.7％，这导致查找表的平均减少78.2％，触发器的平均减少65.5％，DSP48E的平均减少37.1％，因此动态功耗减少74.8％。消费。而且，对于图像过滤中最广泛使用的访问模式，所提出的方法所引起的存储开销为零。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2017年第10期|1674-1687|共14页
作者
Jincheng Su; Fan Yang; Xuan Zeng; Dian Zhou; Jie Chen;
展开▼
作者单位

State Key Laboratory of ASIC and Systems, School of Microelectronics, Fudan University, Shanghai, China;

State Key Laboratory of ASIC and Systems, School of Microelectronics, Fudan University, Shanghai, China;

State Key Laboratory of ASIC and Systems, School of Microelectronics, Fudan University, Shanghai, China;

Fudan University, Shanghai, China;

University of Alberta, Edmonton, AB, Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Partitioning algorithms; System-on-chip; Random access memory; Arrays; Registers; Memory management; Field programmable gate arrays;

机译：分区算法;片上系统;随机存取存储器;阵列;寄存器;存储器管理;现场可编程门阵列;

相似文献

外文文献
中文文献
专利

1. An Efficient Memory Partitioning Approach for Multi-Pattern Data Access via Data Reuse [J] . Li Wensong, Yang Fan, Zhu Hengliang, ACM transactions on reconfigurable technology and systems . 2019,第1期

机译：通过数据重用实现多模式数据访问的有效内存分区方法
2. Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework [J] . Liu Q., Constantinides G. A., Masselos K., Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on . 2009,第3期

机译：将数据重用与数据级并行化相结合，以FPGA为目标的硬件编译：几何编程框架
3. Generating efficient local memory access sequences for coupled subscripts in data-parallel programs [J] . Huang TC., Shiu LC., Lin YJ. Information Sciences: An International Journal . 2003,第4期

机译：在数据并行程序中为耦合的下标生成有效的本地内存访问序列
4. Efficient memory partitioning for parallel data access in multidimensional arrays [C] . Chenyue Meng, Shouyi Yin, Peng Ouyang, ACM/EDAC/IEEE Design Automation Conference . 2015

机译：在多维数组中进行并行数据访问的有效内存分区
5. Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms. [D] . Ren, Bin. 2014

机译：在新兴的并行平台上支持涉及动态数据结构和不规则内存访问的应用程序。
6. The DeSyGNER data access element: a readily reusable component for the construction of data-compatible multimedia programs. [O] . S. R. Deibel, R. A. Greenes 1991

机译：DeSyGNER数据访问元素：一种易于重用的组件用于构建数据兼容的多媒体程序。
7. 1 Combining Data Reuse With Data-Level Parallelization for FPGA Targeted Hardware Compilation: a Geometric Programming Framework [O] . Qiang Liu, George A. Constantinides, Senior Member, 2013

机译：1将数据重用与数据级并行化相结合以实现针对FPGA的硬件编译：几何编程框架
8. Distributed Memory Compiler Methods for Irregular Problems - Data Copy Reuse andRuntime Partitioning [R] . Das, R., Ponnusamy, R., Saltz, J., 1991

机译：不规则问题的分布式内存编译方法 - 数据拷贝重用和运行时分区

Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse

摘要

著录项

相似文献

相关主题

期刊订阅