首页> 外文会议>ACM/IEEE Design Automation Conference >Efficient Smart Monte Carlo based SSTA on Graphics Processing Units with Improved Resource Utilization
【24h】

Efficient Smart Monte Carlo based SSTA on Graphics Processing Units with Improved Resource Utilization

机译:基于高效的智能蒙特卡罗基于图形处理单元的SSTA,具有提高资源利用率

获取原文

摘要

To exploit the benefits of throughput-optimized processors such as GPUs, applications need to be redesigned to achieve performance and efficiency. In this work, we present techniques to speed up statistical timing analysis on throughput processors. We draw upon advancements in improving the efficiency of Monte Carlo based statistical static timing analysis (MC SSTA) using techniques to reduce the sample size or smart sampling techniques. An efficient smart sampling technique, Stratification + Hybrid Quasi Monte Carlo (SH-QMC), is implemented on a GPU based on NVIDIA CUDA architecture. We show that although this application is based on MC analysis with straightforward parallelism available, achieving performance and efficiency on the GPU requires exposing more parallelism and finding locality in computations. This is in contrast with random sampling based algorithms which are inefficient in terms of sample size but can keep resources utilized on a GPU. We show that SH-QMC implemented on a Multi GPU is twice as fast as a single STA on a CPU for benchmark circuits considered. In terms of an efficiency metric, which measures the ability to convert a reduction in sample size to a corresponding reduction in runtime w.r.t a random sampling approach, we achieve 73.9% efficiency with the proposed approaches compared to 4.3% for an implementation involving performing computations on smart samples in parallel. Another contribution of the paper is a critical graph analysis technique to improve the efficiency of Monte Carlo based SSTA, leading to 2-9X further speedup.
机译:为了利用GPU等吞吐量优化处理器的优势,需要重新设计应用以实现性能和效率。在这项工作中,我们提出了加快吞吐量处理器统计时序分析的技术。我们利用推进提高了基于Monte Carlo的统计静态定时分析(MC SSTA)的效率来减少样本大小或智能采样技术。一种高效的智能采样技术,分层+混合准蒙特卡罗(SH-QMC)在基于NVIDIA CUDA架构的GPU上实现。我们表明,尽管该应用程序基于MC分析,但可用直接并行性可用,实现GPU上的性能和效率要求在计算中曝光更多并行性和查找位置。这与基于随机采样的算法形成对比,其在样本大小方面效率低,但可以保持GPU上使用的资源。我们表明,在多GPU上实现的SH-QMC是考虑基准电路的CPU上的单个STA的两倍。就效率指标来说,测量将样本大小的降低转换为随机采样方法的相应减少的能力,我们实现了73.9%的效率,而拟议的方法与涉及执行计算的实施的4.3%相比智能样本并行。本文的另一个贡献是提高基于蒙特卡罗的SSTA效率的关键图分析技术,进一步加速2-9倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号