...
首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse
【24h】

Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse

机译:通过数据重用实现FPGA中并行数据访问的有效内存分区

获取原文
获取原文并翻译 | 示例
           

摘要

Parallelizing the memory accesses in a nested loop is a critical challenge to facilitate loop pipelining. An effective approach for high-level synthesis on field-programmable gate array is to map these accesses to multiple on-chip memory banks using a memory partitioning technique. In this paper, we propose an efficient memory partitioning algorithm with low overhead and low time complexity for parallel data access via data reuse. We find that for most applications in image and video processing, a large amount of data can be reused among different iterations of a loop nest. Motivated by this observation, we propose to cache reusable data using on-chip registers, organized as register chains. The nonreusable data are then separated into several memory banks by a memory partitioning algorithm. We revise the existing padding method to cover cases occurring frequently in our method wherein certain components of partition vector are zeros. Experimental results have demonstrated that compared with the state-of-the-art algorithms, the proposed method is efficient in terms of execution time, resource overhead, and power consumption across a wide range of access patterns extracted from applications in image and video processing. As for the testing patterns, the execution time is typically less than one millisecond. And the number of required memory banks is reduced by 59.7% on average, which leads to an average reduction of 78.2% in look-up tables, 65.5% in flip-flops, 37.1% in DSP48Es, and therefore 74.8% reduction in dynamic power consumption. Moreover, the storage overhead incurred by the proposed method is zero for most widely used access patterns in image filtering.
机译:在嵌套循环中并行化内存访问是促进循环流水线化的关键挑战。在现场可编程门阵列上进行高级综合的有效方法是使用存储器分区技术将这些访问映射到多个片上存储器组。在本文中,我们提出了一种高效的内存划分算法,该算法具有低开销和低时间复杂度,可用于通过数据重用进行并行数据访问。我们发现,对于图像和视频处理中的大多数应用程序,可以在循环嵌套的不同迭代之间重用大量数据。基于这种观察,我们建议使用以寄存器链组织的片上寄存器来缓存可重复使用的数据。然后,不可重复使用的数据通过内存分区算法分成几个存储库。我们修改了现有的填充方法,以涵盖分区方法中某些部分为零的频繁发生的情况。实验结果表明,与最新算法相比,该方法在从图像和视频处理应用程序中提取的各种访问模式上的执行时间,资源开销和功耗方面都非常有效。至于测试模式,执行时间通常少于一毫秒。并且所需的存储体数量平均减少了59.7%,这导致查找表的平均减少78.2%,触发器的平均减少65.5%,DSP48E的平均减少37.1%,因此动态功耗减少74.8%。消费。而且,对于图像过滤中最广泛使用的访问模式,所提出的方法所引起的存储开销为零。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号