Predicting Execution Time of CUDA Kernel Using Static Analysis

机译：使用静态分析预测CUDA内核的执行时间

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the growing demand for performance-oriented problems, programmers routinely execute the embarrassing parallel part of the application (GPU kernels) in a GPU in order to achieve signi?cant speedup. These applications are becoming complex and long-running which makes it energy inef?cient. Anticipating its execution time can help the developers to ?x the inef?cient code before running it. In this paper, we propose an approach to predict the execution time of a GPU kernel without the need of executing it. We build an analytical model to predict the execution time of a GPU kernel by analyzing the intermediate PTX code of a CUDA kernel. Our experimental analysis of a set of benchmarks shows that for 45 applications the estimated execution time has the mean absolute error of 26.86% when compared to the actual execution time. Mean absolute error for benchmarks belonging to Dynamic programming dwarf is minimum, followed by Dense Linear Algebra benchmarks.

机译：随着对性能相关问题的需求日益增长，程序员通常在GPU中执行令人尴尬的并行应用程序部分（GPU内核），以实现显着的加速。这些应用程序变得越来越复杂且需要长期运行，这使其能源效率低下。预期其执行时间可以帮助开发人员在运行无效代码之前对其进行修复。在本文中，我们提出了一种无需执行GPU内核即可预测GPU内核执行时间的方法。我们通过分析CUDA内核的中间PTX代码，构建了一个分析模型来预测GPU内核的执行时间。我们对一组基准的实验分析表明，对于45个应用程序，与实际执行时间相比，估计执行时间的平均绝对误差为26.86％。属于动态编程侏儒的基准的平均绝对误差最小，其次是密集线性代数基准。

著录项

来源
《2018 IEEE Intl Conf on Parallel amp; Distributed Processing with Applications, Ubiquitous Computing amp; Communications, Big Data amp; Cloud Computing, Social Computing amp; Networking, Sustainable Computing amp; Communications》|2018年|948-955|共8页
会议地点 Melbourne(AU)
作者
Gargi Alavani; Kajal Varma; Santonu Sarkar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Graphics processing units; Kernel; Delays; Instruction sets; Hardware; Computational modeling; Predictive models;

机译：图形处理单元;内核;延迟;指令集;硬件;计算建模;预测模型;;
入库时间 2022-08-26 14:32:09

相似文献

外文文献
中文文献
专利

1. Nonoverlapping local/global iterations with 2-D/1-D fusion transport kernel and p-CMFD wrapper for transient reactor analysis-II: Parallelization and Predictor-Corrector Quasi-Static method application [J] . Cho Bumhee, Cho Nam Zin Annals of nuclear energy . 2016,第Apra期

机译：具有2-D / 1-D融合传输核和p-CMFD包装器的非重叠局部/全局迭代用于瞬态反应器分析-II：并行化和预测器校正准静态方法应用
2. Ahead of time static analysis for automatic generation of debugging interfaces to the Linux kernel [J] . Tegawende F. Bissyande, Laurent Reveillere, Julia L. Lawall, Automated software engineering . 2016,第1期

机译：提前进行静态分析，以自动生成Linux内核的调试接口
3. Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA [J] . J. Habich, T. Zeiser, G. Hager, Advances in Engineering Software . 2011,第5期

机译：使用CUDA在nVIDIA GPU上D3Q19晶格Boltzmann内核的性能分析和优化策略
4. Predicting execution time of CUDA kernel using static analysis [C] . Gargi Alavani, Kajal Varma, Santonu Sarkar IEEE International Conference on Big Data and Cloud Computing . 2018

机译：使用静态分析预测CUDA内核的执行时间
5. Analysis of Unified Memory Performance and Protection for Concurrent Kernel Execution [D] . Mankad, Kartik 2018

机译：统一内存性能和保护对并发内核执行的保护分析
6. Predicting Zea mays Flowering Time Yield and Kernel Dimensions by Analyzing Aerial Images [O] . Guosheng Wu, Nathan D. Miller, Natalia de Leon, -1

机译：通过分析航空影像预测玉米的开花时间产量和籽粒尺寸
7. CUDA-For-Clusters: A System for Efficient Execution of CUDA Kernels on Multi-Core Clusters [O] . Raghu Prabhakar, R. Govindarajan, Matthew J. Thazhuthaveetil 2013

机译：CUDa-For-Clusters：在多核集群上高效执行CUDa内核的系统

Predicting Execution Time of CUDA Kernel Using Static Analysis

摘要

著录项

相似文献

相关主题

期刊订阅