首页> 外文会议>47th ACM/IEEE Design Automation Conference >Efficient smart monte carlo based SSTA on graphics processing units with improved resource utilization
【24h】

Efficient smart monte carlo based SSTA on graphics processing units with improved resource utilization

机译:在图形处理单元上基于智能蒙特卡洛的高效SSTA,具有更高的资源利用率

获取原文

摘要

To exploit the benefits of throughput-optimized processors such as GPUs, applications need to be redesigned to achieve performance and efficiency. In this work, we present techniques to speed up statistical timing analysis on throughput processors. We draw upon advancements in improving the efficiency of Monte Carlo based statistical static timing analysis (MC SSTA) using techniques to reduce the sample size or smart sampling techniques. An efficient smart sampling technique, Stratification + Hybrid Quasi Monte Carlo (SH-QMC), is implemented on a GPU based on NVIDIA CUDA architecture. We show that although this application is based on MC analysis with straightforward parallelism available, achieving performance and efficiency on the GPU requires exposing more parallelism and finding locality in computations. This is in contrast with random sampling based algorithms which are inefficient in terms of sample size but can keep resources utilized on a GPU. We show that SH-QMC implemented on a Multi GPU is twice as fast as a single STA on a CPU for benchmark circuits considered. In terms of an efficiency metric, which measures the ability to convert a reduction in sample size to a corresponding reduction in runtime w.r.t a random sampling approach, we achieve 73.9% efficiency with the proposed approaches compared to 4.3% for an implementation involving performing computations on smart samples in parallel. Another contribution of the paper is a critical graph analysis technique to improve the efficiency of Monte Carlo based SSTA, leading to 2–9X further speedup.
机译:为了利用吞吐量优化的处理器(例如GPU)的优势,需要重新设计应用程序以实现性能和效率。在这项工作中,我们提出了可加速吞吐量处理器上的统计时序分析的技术。我们利用减少样本量的技术或智能采样技术,在提高基于蒙特卡洛的统计静态时序分析(MC SSTA)的效率方面取得了进步。在基于NVIDIA CUDA架构的GPU上实现了高效的智能采样技术,即分层+混合拟蒙特卡洛(SH-QMC)。我们证明,尽管此应用程序基于具有直接并行性的MC分析,但要在GPU上实现性能和效率,则需要公开更多并行性并在计算中寻找局部性。这与基于随机采样的算法相反,该算法在采样大小方面效率低下,但可以将资源保持在GPU上。我们展示了在考虑到基准电路的情况下,在Multi GPU上实现的SH-QMC的速度是CPU上单个STA的两倍。就效率度量而言,该度量衡量使用随机采样方法将样本数量的减少转换为相应的运行时间减少的能力,对于所建议的方法,我们实现了73.9%的效率,而对于涉及对计算量进行计算的实现,则该效率为4.3%并行智能样本。该论文的另一项贡献是一种临界图分析技术,可提高基于Monte Carlo的SSTA的效率,从而使速度进一步提高2–9倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号