首页> 外文期刊>ACM Transactions on Modeling and Computer Simulation >PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs
【24h】

PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs

机译:PDES-A:在FPGA上实现的并行离散事件仿真的加速器

获取原文
获取原文并翻译 | 示例

摘要

In this article, we present experiences implementing a general Parallel Discrete Event Simulation (PDES) accelerator on a Field Programmable Gate Array (FPGA). The accelerator can be specialized to any particular simulation model by defining the object states and the event handling code, which are then synthesized into a custom accelerator for the given model. The accelerator consists of several event processors that can process events in parallel while maintaining the dependencies between them. Events are automatically sorted by a self-sorting event queue. The accelerator supports optimistic simulation by automatically keeping track of event history and supporting rollbacks. The architecture is limited in scalability locally by the communication and port bandwidth of the different structures. However, it is designed to allow multiple accelerators to be connected to scale up the simulation. We evaluate the design and explore several design trade-offs and optimizations. We show that the accelerator can scale to 64 concurrent event processors relative to the performance of a single event processor. At this point, the scalability becomes limited by contention on the shared structures within the datapath. To alleviate this bottleneck, we also develop a new version of the datapath that partitions the state and event space of the simulation but allows these partitions to share the use of the event processors. The new design substantially reduces contention and improves the performance with 64 processors from 49x to 62x relative to a single processor design. We went through two iterations of the design of PDES-A, first using Verilog and then using Chisel (for the partitioned version of the design). We report in this article on some observations in the differences in prototyping accelerators using these two different languages. PDES-A outperforms the ROSS simulator running on a 12-core Intel Xeon machine by a factor of 3.2x with less than 15% of the power consumption. Our future work includes building multiple interconnected PDES-A cores.
机译:在本文中,我们在现场可编程门阵列(FPGA)上呈现实现一般并行离散事件仿真(PDE)加速器的经验。通过定义对象状态和事件处理代码,可以专门针对任何特定的仿真模型,然后将其合成为给定模型的自定义加速器。加速器由几个事件处理器组成,可以在维护它们之间的依赖关系时并行处理事件。事件由自我排序事件队列自动排序。 Accelerator通过自动跟踪事件历史记录和支持回滚来支持乐观模拟。通过不同结构的通信和端口带宽本地的可伸缩性受到限制。但是,它旨在允许连接多个加速器以扩展模拟。我们评估设计并探索多种设计权衡和优化。我们表明加速器可以相对于单个事件处理器的性能缩放到64个并发事件处理器。此时,可伸缩性因数据路径内的共享结构的争用而受到限制。为了缓解此瓶颈,我们还开发了一个新版本的数据路径,分区了模拟的状态和事件空间,但允许这些分区共享事件处理器的使用。新设计基本上减少了争用,并通过64个处理器从49倍到62倍提高了性能,相对于单个处理器设计。我们通过两个迭代的PDES-A设计,首先使用Verilog,然后使用Chisel(用于设计的分区版本)。我们在本文中报告了使用这两种不同语言的原型加速器差异的一些观察。 PDES-A超越了12芯Intel Xeon机器上运行的罗斯模拟器,占功耗低于15%的3.2倍。我们未来的工作包括构建多个互联的PDE-A核心。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号