...
首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA
【24h】

Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA

机译:使用单个FPGA的大规模片上网络的快速,精确周期仿真

获取原文
获取原文并翻译 | 示例

摘要

Modeling and simulation/emulation play a major role in research and development of novel Networks-on-Chip (NoCs). However, conventional software simulators are so slow that studying NoCs for emerging many-core systems with hundreds to thousands of cores is challenging. State-of-the-art FPGA-based NoC emulators have shown great potential in speeding up the NoC simulation, but they cannot emulate large-scale NoCs due to the FPGA capacity constraints. Moreover, emulating large-scale NoCs under synthetic workloads on FPGAs typically requires a large amount of memory and thus involves the use of off-chip memory, which makes the overall design much more complicated and may substantially degrade the emulation speed. This article presents methods for fast and cycle-accurate emulation of NoCs with up to thousands of nodes using a single FPGA. We first describe how to emulate a NoC under a synthetic workload using only FPGA on-chip memory (BRAMs). We next present a novel use of time-division multiplexing where BRAMs are effectively used for emulating a network using a small number of nodes, thereby overcoming the FPGA capacity constraints. We propose methods for emulating both direct and indirect networks, focusing on the commonly used meshes and fat-trees (k-ary n-trees). This is different from prior work that considers only direct networks. Using the proposed methods, we build a NoC emulator, called FNoC, and demonstrate the emulation of some mesh-based and fat-tree-based NoCs with canonical router architectures. Our evaluation results show that (1) the size of the largest NoC that can be emulated depends on only the FPGA on-chip memory capacity; (2) a mesh-based NoC with 16,384 nodes (128 x 128 NoC) and a fat-tree-based NoC with 6,144 switch nodes and 4,096 terminal nodes (4-ary 6-tree NoC) can be emulated using a single Virtex-7 FPGA; and (3) when emulating these two NoCs, we achieve, respectively, 5,047x and 232x speedups over BookSim, one of the most widely used software-based NoC simulators, while maintaining the same level of accuracy.
机译:建模和仿真/仿真在新型片上网络(NoC)的研发中起着重要作用。但是,传统的软件模拟器太慢了,以至于研究新兴的具有数百至数千个内核的多核系统的NoC颇具挑战。基于FPGA的最先进的NoC仿真器在加速NoC仿真方面显示出了巨大的潜力,但是由于FPGA的容量限制,它们无法仿真大规模的NoC。此外,在FPGA上的合成工作负载下仿真大规模NoC通常需要大量的存储器,因此涉及使用片外存储器,这使整体设计更加复杂,并可能大大降低仿真速度。本文介绍了使用单个FPGA对多达数千个节点的NoC进行快速,精确周期仿真的方法。我们首先描述如何仅使用FPGA片上存储器(BRAM)在综合工作负载下仿真NoC。接下来,我们提出一种时分复用的新颖用法,其中BRAM有效地用于模拟使用少量节点的网络,从而克服了FPGA的容量限制。我们提出了直接和间接网络的仿真方法,重点是常用的网格和胖树(k元n树)。这不同于仅考虑直接网络的先前工作。使用提出的方法,我们构建了一个称为FNoC的NoC仿真器,并演示了使用规范路由器体系结构对某些基于网格和基于胖树的NoC的仿真。我们的评估结果表明:(1)可以仿真的最大NoC的大小仅取决于FPGA片上存储器的容量; (2)可以使用单个Virtex-V仿真具有16384个节点(128 x 128 NoC)的基于网格的NoC和具有6,144个交换节点和4,096个终端节点(4进制6树的NoC)的基于胖树的NoC。 7 FPGA; (3)在模拟这两个NoC时,我们分别比BookSim(基于软件的NoC模拟器使用最广泛的软件之一)获得了5,047x和232x的加速,同时保持了相同的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号