Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications

机译：缓存感知内核平铺：基于GPU的应用程序的系统级性能优化方法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a software approach to address the data latency issue for certain GPU applications. Each application is modeled as a kernel graph, where the nodes represent individual GPU kernels and the edges capture data dependencies. Our technique exploits the GPU L2 cache to accelerate parameter passing between the kernels. The key idea is that, instead of having each kernel process the entire input in one invocation, we subdivide the input into fragments (which fit in the cache) and, ideally, process each fragment in one continuous sequence of kernel invocations. Our proposed technique is oblivious to kernel functionalities and requires minimal source code modification. We demonstrate our technique on a full-fledged image processing application and improve the performance on average by 30% over various settings.

机译：我们提出了一种软件方法来解决某些GPU应用程序的数据延迟问题。每个应用程序都被建模为内核图，其中节点表示单个GPU内核和边缘捕获数据依赖性。我们的技术利用GPU L2缓存来加速在内核之间传递的参数。关键的想法是，而不是在一次调用中拥有每个内核处理整个输入，而是将输入分解为片段（适合缓存中），理想情况下，以一个连续的内核调用序列处理每个片段。我们所提出的技术对内核功能感到不知情，需要最小的源代码修改。我们在全方位的图像处理应用程序上展示了我们的技术，并在各种设置上平均提高性能30％。

著录项

来源
《Design, Automation amp;amp;amp;amp;amp;amp; Test in Europe Conference amp;amp;amp;amp;amp;amp; Exhibition》|2019年|593p|共6页
会议地点
作者
Arian Maghazeh; Sudipta Chattopadhyay; Petru Eles; Zebo Peng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP11-53;
关键词
Kernel; Graphics processing units; Instruction sets; Schedules; Throughput; Optimization; Tools;

机译：内核;图形处理单元;指令集;时间表;吞吐量;优化;工具;

相似文献

外文文献
中文文献
专利

1. Targeting System-Level and Kernel-Level Optimizations of Computer Vision Applications on Embedded Systems [J] . Dekkiche Djamila, Vincke Bastien, Merigot Alain Journal of Low Power Electronics . 2017,第4期

机译：针对嵌入式系统上的计算机视觉应用程序的系统级和内核级优化
2. System-Level Performance and Power Optimization for MPSoC: A Memory Access-Aware Approach [J] . Lin Ye-Jyun, Yang Chia-Lin, Huang Jiao-We, ACM Transactions on Embedded Computing Systems . 2015,第1期

机译：MPSoC的系统级性能和功率优化：一种内存访问感知方法
3. Kernel software platform for electrical device optimization: Application to the optimization of a linear actuator performance [J] . Imen Amdouni, Ramzi Saadaoui, Lilia El Amraoui, International journal of applied electromagnetics and mechanics . 2011,第2a3期

机译：电气设备优化的内核软件平台：应用于线性执行器性能的优化
4. Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications [C] . Arian Maghazeh, Sudipta Chattopadhyay, Petru Eles, Design, Automation and Test in Europe Conference and Exhibition . 2019

机译：缓存感知内核切片：一种基于GPU的应用程序的系统级性能优化的方法
5. Cache-aware application parallelization and optimization for multicores. [D] . Zhang, Yuanrui. 2011

机译：缓存感知的应用程序并行化和多核优化。
6. Accuracy and Performance of Functional Parameter Estimation Using a Novel Numerical Optimization Approach for GPU-Based Kinetic Compartmental Modeling [O] . Igor Svistoun, Brandon Driscoll, Catherine Coolens 2019

机译：基于GPU的动力学隔室建模的新型数值优化方法估计功能参数的准确性和性能
7. Optimizing Integrated Application Performance with Cache-aware Metascheduling [O] . Brian Dougherty, Jules White, Russell Kegley, 2013

机译：通过支持缓存的元计划优化集成应用程序的性能

Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅