首页> 外文期刊>ACM Transactions on Modeling and Computer Simulation >PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs
【24h】

PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs

机译:PDES-A:在FPGA上实现的并行离散事件仿真的加速器

获取原文
获取原文并翻译 | 示例

摘要

In this article, we present experiences implementing a general Parallel Discrete Event Simulation (PDES) accelerator on a Field Programmable Gate Array (FPGA). The accelerator can be specialized to any particular simulation model by defining the object states and the event handling code, which are then synthesized into a custom accelerator for the given model. The accelerator consists of several event processors that can process events in parallel while maintaining the dependencies between them. Events are automatically sorted by a self-sorting event queue. The accelerator supports optimistic simulation by automatically keeping track of event history and supporting rollbacks. The architecture is limited in scalability locally by the communication and port bandwidth of the different structures. However, it is designed to allow multiple accelerators to be connected to scale up the simulation. We evaluate the design and explore several design trade-offs and optimizations. We show that the accelerator can scale to 64 concurrent event processors relative to the performance of a single event processor. At this point, the scalability becomes limited by contention on the shared structures within the datapath. To alleviate this bottleneck, we also develop a new version of the datapath that partitions the state and event space of the simulation but allows these partitions to share the use of the event processors. The new design substantially reduces contention and improves the performance with 64 processors from 49x to 62x relative to a single processor design. We went through two iterations of the design of PDES-A, first using Verilog and then using Chisel (for the partitioned version of the design). We report in this article on some observations in the differences in prototyping accelerators using these two different languages. PDES-A outperforms the ROSS simulator running on a 12-core Intel Xeon machine by a factor of 3.2x with less than 15% of the power consumption. Our future work includes building multiple interconnected PDES-A cores.
机译:在本文中,我们介绍了在现场可编程门阵列(FPGA)上实现通用并行离散事件模拟(PDES)加速器的经验。通过定义对象状态和事件处理代码,可以将加速器专用于任何特定的仿真模型,然后将其合成为给定模型的自定义加速器。加速器由几个事件处理器组成,它们可以并行处理事件,同时保持它们之间的依赖性。事件由自动排序事件队列自动排序。该加速器通过自动跟踪事件历史记录和支持回滚来支持乐观模拟。该体系结构在本地的可扩展性受到不同结构的通信和端口带宽的限制。但是,它旨在允许连接多个加速器以扩大仿真范围。我们评估设计并探索一些设计折衷和优化。我们展示了相对于单个事件处理器的性能,加速器可以扩展到64个并发事件处理器。此时,可伸缩性受到数据路径内共享结构争用的限制。为了缓解这一瓶颈,我们还开发了数据路径的新版本,该版本对模拟的状态和事件空间进行了分区,但允许这些分区共享事件处理器的使用。相对于单处理器设计,新设计可将64个处理器的性能从49倍提高到62倍,从而大大减少了竞争并提高了性能。我们经历了PDES-A设计的两次迭代,首先使用Verilog,然后使用Chisel(用于设计的分区版本)。我们在本文中报告了使用这两种不同语言在原型加速器方面的差异的一些观察结果。 PDES-A的性能比运行在12核Intel Xeon计算机上的ROSS模拟器的性能高3.2倍,而功耗却不到15%。我们未来的工作包括构建多个互连的PDES-A内核。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号