首页> 外文会议>IEEE High Performance Extreme Computing Conference >On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA
【24h】

On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA

机译:在3D存储器集成FPGA上实现2D FFT的片上存储器高效数据布局

获取原文

摘要

3D memories are becoming viable solutions for the memory wall problem and meeting the bandwidth requirements of memory intensive applications. The high bandwidth provided by 3D memories does not translate to a proportional increase in performance for all applications. For an application such as 2D FFT with strided access patterns, the data layout of the memory has a significant impact on the total execution time of the implementation. In this paper, we present a data layout for 2D FFT on 3D memory integrated FPGA that is both on-chip memory efficient as well as throughput-optimal. Our data layout ensures that consecutive accesses to 3D memory are sufficiently interleaved among layers and vaults to absorb latency due to activation overheads for both sequential (Row FFT) and strided (Column FFT) accesses. The current state-of-the-art implementation on 3D memory requires O(√cN) on-chip memory to reduce the strided accesses and achieve maximum bandwidth for an N × N FFT problem size and c columns in a 3D memory bank row. Our proposed data layout optimizes the throughput of both the Row FFT and Column FFT phases of 2D FFT with O(N) on-chip memory for the same problem size and memory parameters without decreasing the memory bandwidth thereby achieving a √c× reduction in on-chip memory. On architectures with limited on-chip memory, our data layout achieves 2× to 4× improvement in execution time compared with the state-of-art 2D FFT implementation on 3D memory.
机译:3D内存正成为解决内存壁问题并满足内存密集型应用程序带宽需求的可行解决方案。 3D存储器提供的高带宽并不能转化为所有应用程序的性能成比例的增长。对于具有跨接访问模式的2D FFT之类的应用,存储器的数据布局对实现的总执行时间有重大影响。在本文中,我们提出了在3D内存集成FPGA上进行2D FFT的数据布局,该数据布局既具有片上存储效率,又具有吞吐量最优的特性。我们的数据布局可确保对连续3D存储器的连续访问在层和保管库之间充分交织,以吸收由于顺序访问(行FFT)和跨步访问(列FFT)的激活开销而引起的延迟。当前在3D存储器上的最新实现方式需要O(√cN)片上存储器来减少跨步访问,并为N×N FFT问题大小和3D存储器库行中的c列实现最大带宽。对于相同的问题大小和存储参数,我们提出的数据布局可在具有O(N)片上存储器的情况下优化2D FFT的行FFT和列FFT阶段的吞吐率,而不会降低存储器带宽,从而实现了√c×的减小。芯片内存。在片上存储器有限的架构上,与3D存储器上最新的2D FFT实现相比,我们的数据布局将执行时间提高了2倍至4倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号