首页> 外文OA文献 >Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
【2h】

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

机译:Kernelet:具有动态切片和动态切片的高吞吐量GpU内核执行   调度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Graphics processors, or GPUs, have recently been widely used as acceleratorsin the shared environments such as clusters and clouds. In such sharedenvironments, many kernels are submitted to GPUs from different users, andthroughput is an important metric for performance and total ownership cost.Despite the recently improved runtime support for concurrent GPU kernelexecutions, the GPU can be severely underutilized, resulting in suboptimalthroughput. In this paper, we propose Kernelet, a runtime system with dynamicslicing and scheduling techniques to improve the throughput of concurrentkernel executions on the GPU. With slicing, Kernelet divides a GPU kernel intomultiple sub-kernels (namely slices). Each slice has tunable occupancy to allowco-scheduling with other slices and to fully utilize the GPU resources. Wedevelop a novel and effective Markov chain based performance model to guide thescheduling decision. Our experimental results demonstrate up to 31.1% and 23.4%performance improvement on NVIDIA Tesla C2050 and GTX680 GPUs, respectively.
机译:图形处理器或GPU最近已被广泛用作群集和云等共享环境中的加速器。在这样的共享环境中,许多内核是由不同用户提交给GPU的,吞吐量是衡量性能和总拥有成本的重要指标。尽管最近改进了对并行GPU内核执行的运行时支持,但GPU可能被严重利用不足,导致吞吐量不理想。在本文中,我们提出了Kernelet,这是一种具有动态切片和调度技术的运行时系统,可以提高GPU上并发内核执行的吞吐量。通过切片,Kernelet将GPU内核划分为多个子内核(即切片)。每个切片具有可调的占用率,以允许与其他切片进行协同调度并充分利用GPU资源。我们开发了一种新颖有效的基于马尔可夫链的绩效模型来指导调度决策。我们的实验结果表明,NVIDIA Tesla C2050和GTX680 GPU的性能分别提高了31.1%和23.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号