首页> 外文会议>International Conference on Field Programmable Logic and Applications >RapidLayout: Fast Hard Block Placement of FPGA-Optimized Systolic Arrays using Evolutionary Algorithms
【24h】

RapidLayout: Fast Hard Block Placement of FPGA-Optimized Systolic Arrays using Evolutionary Algorithms

机译:RapidLayout:使用进化算法对FPGA优化的脉动阵列进行快速硬块放置

获取原文

摘要

Evolutionary algorithms can outperform conventional placement algorithms such as simulated annealing, analytical placement as well as manual placement on metrics such as runtime, wirelength, pipelining cost, and clock frequency when mapping FPGA hard block intensive designs such as systolic arrays on Xilinx UltraScale+ FPGAs. For certain hard-block intensive, systolic array accelerator designs, the commercial-grade Xilinx Vivado CAD tool is unable to provide a legal routing solution without tedious manual placement constraints. Instead, we formulate an automatic FPGA placement algorithm for these hard blocks as a multi-objective optimization problem that targets wirelength squared and maximum bounding box size metrics. We build an end-to-end placement and routing flow called RapidLayout using the Xilinx RapidWright framework. RapidLayout runs 5-6 times faster than Vivado with manual constraints and eliminates the weeks-long effort to generate placement constraints manually for the hard blocks. We also perform automated post-placement pipelining of the long wires inside each convolution block to target 650 MHz URAM-limited operation. RapidLayout outperforms (1) the simulated annealer in VPR by 33% in runtime, 1.9-2.4 times in wirelength, and 3-4 times in bounding box size, while also (2) beating the analytical placer UTPlaceF by 9.3 times in runtime, 1.8-2.2 times in wirelength, and 2-2.7 times in bounding box size. We employ transfer learning from a base FPGA device to speed-up placement optimization for similar FPGA devices in the UltraScale+ family by 11-14 times than learning the placements from scratch.
机译:当在FPGA硬块密集型设计(例如Xilinx UltraScale + FPGA上的脉动阵列)上进行映射时,进化算法的性能优于传统的布局算法,例如模拟退火,分析布局以及在运行时,线长,流水线成本和时钟频率等指标上的手动布局。对于某些硬块密集的脉动阵列加速器设计,如果没有繁琐的手动放置约束,则商业级Xilinx Vivado CAD工具将无法提供合法的布线解决方案。相反,我们针对这些硬模块制定了一种自动FPGA放置算法,将其作为针对多目标线长平方和最大包围盒尺寸指标的多目标优化问题。我们使用Xilinx RapidWright框架构建名为RapidLayout的端到端布局和布线流程。 RapidLayout的运行速度比使用手动约束的Vivado快5-6倍,并且省去了为数周的工作,以手动生成硬块的放置约束。我们还对每个卷积块内的长导线执行自动贴装后流水线处理,以达到650 MHz URAM限制的目标。 RapidLayout的性能优于(1)VPR中的模拟退火器在运行时提高了33%,线长为1.9-2.4倍,包围盒尺寸为3-4倍,同时(2)在运行时优于分析型放置器UTPlaceF 9.3倍,1.8线长为-2.2倍,边框尺寸为2-2.7倍。与从头开始学习布局相比,我们采用从基本FPGA器件进行转移学习的方法,将UltraScale +系列中类似FPGA器件的布局优化速度提高了11到14倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号