首页> 外文期刊>Proceedings of the Workshop on Principles of Advanced and Distributed Simulation >Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs
【24h】

Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

机译:用于GPU上的时间和成本有效的并行离散事件仿真的多级并行

获取原文
获取原文并翻译 | 示例
           

摘要

Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design space explorations. In this paper, we present a parallel discrete event simulation scheme that enables cost- and time-efficient execution of large scale parameter studies on GPUs. In order to efficiently accommodate the stream-processing paradigm of GPUs, our parallelization scheme exploits two orthogonal levels of parallelism: External parallelism among the inherently independent simulations of a parameter study and internal parallelism among independent events within each individual simulation of a parameter study. Specifically, we design an event aggregation strategy based on external parallelism that generates workloads suitable for GPUs. In addition, we define a pipelined event execution mechanism based on internal parallelism to hide the transfer latencies between host- and GPU-memory. We analyze the performance characteristics of our parallelization scheme by means of a prototype implementation and show a 25-fold performance improvement over purely CPU-based execution.
机译:开发复杂的技术系统需要对给定的设计空间进行系统的探索,以便确定最佳的系统配置。但是,研究即使是少数几个系统参数的影响和相互作用,通常也需要大量的模拟运行。反过来,这导致了过多的运行时需求,从而严重阻碍了对设计空间的彻底探索。在本文中,我们提出了一种并行的离散事件模拟方案,该方案能够在GPU上以成本和时间效率地执行大规模参数研究。为了有效地容纳GPU的流处理范式,我们的并行化方案采用了两个正交的并行度:参数研究的固有独立模拟之间的外部并行性,以及参数研究的每个模拟内独立事件之间的内部并行性。具体来说,我们基于外部并行性设计事件聚合策略,该策略可生成适用于GPU的工作负载。此外,我们基于内部并行性定义了流水线事件执行机制,以隐藏主机内存和GPU内存之间的传输延迟。我们通过原型实现来分析并行化方案的性能特征,并显示出与纯基于CPU的执行相比性能提高了25倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号