A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels

机译：自动建议复杂GPU内核的源代码优化的工具

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than today's systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular control flow and memory access patterns. However, the growing complexity, exposed memory hierarchy, incoherence, heterogeneity, and parallelism will make accelerator-based systems progressively more difficult to program. In the foreseeable future, the vast majority of programmers will no longer be able to extract additional performance or energy-savings from next-generation systems because the programming will be too difficult. Automatic performance analysis and optimization recommendation tools have the potential to avert this situation They embody expert knowledge and make it available to software developers when needed In this paper, we describe and evaluate such a tool. It quantifies performance characteristics of GPU code through profiling, employs machine learning models to estimate the suitability and benefit of several known source-code optimizations, ranks the optimizations, and suggests the most promising ones to the user if the expected speedup is sufficiently high.

机译：从手持设备到超级计算机的未来计算系统无疑将比今天的系统更加并行和异构，以提供更高的性能和能效。因此，GPU被越来越多地用于加速通用应用程序，包括具有依赖于数据，不规则控制流和内存访问模式的应用程序。但是，日益增长的复杂性，公开的内存层次结构，不连贯性，异构性和并行性将使基于加速器的系统越来越难以编程。在可预见的将来，绝大多数编程人员将不再能够从下一代系统中获得额外的性能或节省能源，因为编程将非常困难。自动性能分析和优化推荐工具有可能避免这种情况，它们体现了专业知识，并在需要时可用于软件开发人员。在本文中，我们描述和评估了这种工具。它通过分析来量化GPU代码的性能特征，采用机器学习模型来评估几种已知源代码优化的适用性和收益，对优化进行排名，并在预期的加速足够高的情况下向用户建议最有前途的优化。

著录项

来源
《International conference on parallel and distributed processing techniques and applications》|2015年|589-598|共10页
会议地点
作者
Saeed Taheri; Apan Qasem; Martin Burtscher;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria [J] . Hector Ortega-Arranz, Yuri Torres, Arturo Gonzalez-Escribano, Journal of supercomputing . 2014,第2期

机译：使用内核表征标准为NVIDIA GPU优化APSP实施
2. Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA [J] . J. Habich, T. Zeiser, G. Hager, Advances in Engineering Software . 2011,第5期

机译：使用CUDA在nVIDIA GPU上D3Q19晶格Boltzmann内核的性能分析和优化策略
3. Communication and computation optimization of concurrent kernels using kernel coalesce on a GPU [J] . B. Neelima, G. Ram Mohana Reddy, Prakash S. Raghavendra Concurrency and computation: practice and experience . 2015,第1期

机译：在GPU上使用内核合并对并发内核进行通信和计算优化
4. A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels [C] . Saeed Taheri, Apan Qasem, Martin Burtscher International Conference on Parallel and Distributed Processing Techniques and Applications . 2015

机译：用于自动暗示复杂GPU内核的源代码优化的工具
5. Automatic transformation and optimization of applications on GPUs and GPU clusters. [D] . Ma, Wenjing. 2011

机译：在GPU和GPU群集上自动转换和优化应用程序。
6. GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs [O] . Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta, -1

机译：GpU-Fs-KNN：一个软件工具用于快速可扩展的kNN计算使用的GpU
7. Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs [O] . Cheng Li, Abdul Dakkak, Jinjun Xiong, 2020

机译：Benanza：自动μBenchmark发电，计算“低界”延迟，并通知GPU上深度学习模型的优化

A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels

摘要

著录项

相似文献

相关主题

期刊订阅