首页> 外文会议>Design, Automation amp;amp;amp;amp;amp;amp; Test in Europe Conference amp;amp;amp;amp;amp;amp; Exhibition >Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications
【24h】

Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications

机译:缓存感知内核平铺:基于GPU的应用程序的系统级性能优化方法

获取原文
获取外文期刊封面目录资料

摘要

We present a software approach to address the data latency issue for certain GPU applications. Each application is modeled as a kernel graph, where the nodes represent individual GPU kernels and the edges capture data dependencies. Our technique exploits the GPU L2 cache to accelerate parameter passing between the kernels. The key idea is that, instead of having each kernel process the entire input in one invocation, we subdivide the input into fragments (which fit in the cache) and, ideally, process each fragment in one continuous sequence of kernel invocations. Our proposed technique is oblivious to kernel functionalities and requires minimal source code modification. We demonstrate our technique on a full-fledged image processing application and improve the performance on average by 30% over various settings.
机译:我们提出了一种软件方法来解决某些GPU应用程序的数据延迟问题。每个应用程序都被建模为内核图,其中节点表示单个GPU内核和边缘捕获数据依赖性。我们的技术利用GPU L2缓存来加速在内核之间传递的参数。关键的想法是,而不是在一次调用中拥有每个内核处理整个输入,而是将输入分解为片段(适合缓存中),理想情况下,以一个连续的内核调用序列处理每个片段。我们所提出的技术对内核功能感到不知情,需要最小的源代码修改。我们在全方位的图像处理应用程序上展示了我们的技术,并在各种设置上平均提高性能30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号