首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Impact of Parallelism and Memory Architecture on FPGA Communication Energy
【24h】

Impact of Parallelism and Memory Architecture on FPGA Communication Energy

机译:并行性和存储器架构对FPGA通信能量的影响

获取原文
获取原文并翻译 | 示例

摘要

The energy in FPGA computations is dominated by data communication energy, either in the form of memory references or data movement on interconnect. In this article, we explore how to use data placement and parallelism to reduce communication energy. We show that parallelism can reduce energy and that the optimal level of parallelism increases with the problem size. We further explore how FPGA memory architecture (memory block size(s), memory banking, and spacing between memory banks) can impact communication energy, and determine how to organize the memory architecture to guarantee that the energy overhead compared to the optimally matched architecture for the design is never more than 60%. We specifically show that an architecture with 32 bit wide, 16Kb internally banked memories placed every 8 columns of 10 4-LUT logic blocks is within 61% of the optimally matched architecture across the VTR 7 benchmark set and a set of parallelism-tunable benchmarks. Without internal banking, the worst-case overhead is 98%, achieved with an architecture with 32 bit wide, 8Kb memories placed every 9 columns, roughly comparable to the memory organization on the Cyclone V (where memories are placed about every 10 columns). Monolithic 32 bit wide, 16Kb memories placed every 10 columns (comparable to 18Kb and 20Kb memories used in Virtex 4 and Stratix V FPGAs) have a 180% worst-case energy overhead. Furthermore, we show practical cases where designs mapped for optimal parallelism use 4.7x less energy than designs using a single processing element.
机译:FPGA计算中的能量主要由数据通信能量决定,无论是内存引用形式还是互连上的数据移动形式。在本文中,我们探讨了如何使用数据放置和并行性来减少通信能量。我们证明了并行性可以减少能量,并且并行性的最佳级别随问题的大小而增加。我们将进一步探索FPGA存储器架构(存储器块大小,存储器库和存储器库之间的间距)如何影响通信能量,并确定如何组织存储器架构以确保与最佳匹配架构相比的能量开销。设计永远不会超过60%。我们特别表明,具有32位宽,16Kb内部存储的存储器的架构每10列4-LUT逻辑块中的每8列放置一次,这在整个VTR 7基准测试集和一组并行可调基准测试中,是最佳匹配架构的61%。如果没有内部存储,采用32位宽,每9列放置8Kb存储器的架构所实现的最坏情况的开销为98%,大致可与Cyclone V上的存储器组织(每10列放置一次存储器)相媲美。每10列放置32位宽的单片16Kb存储器(相当于Virtex 4和Stratix V FPGA中使用的18Kb和20Kb存储器)在最坏情况下的能量开销为180%。此外,我们展示了实际情况,其中为实现最佳并行性而映射的设计比使用单个处理元素的设计消耗的能量少4.7倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号