首页> 外文OA文献 >Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

【2h】

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

机译：Kernelet：具有动态切片和动态切片的高吞吐量GpU内核执行调度

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Graphics processors, or GPUs, have recently been widely used as acceleratorsin the shared environments such as clusters and clouds. In such sharedenvironments, many kernels are submitted to GPUs from different users, andthroughput is an important metric for performance and total ownership cost.Despite the recently improved runtime support for concurrent GPU kernelexecutions, the GPU can be severely underutilized, resulting in suboptimalthroughput. In this paper, we propose Kernelet, a runtime system with dynamicslicing and scheduling techniques to improve the throughput of concurrentkernel executions on the GPU. With slicing, Kernelet divides a GPU kernel intomultiple sub-kernels (namely slices). Each slice has tunable occupancy to allowco-scheduling with other slices and to fully utilize the GPU resources. Wedevelop a novel and effective Markov chain based performance model to guide thescheduling decision. Our experimental results demonstrate up to 31.1% and 23.4%performance improvement on NVIDIA Tesla C2050 and GTX680 GPUs, respectively.

机译：图形处理器或GPU最近已被广泛用作群集和云等共享环境中的加速器。在这样的共享环境中，许多内核是由不同用户提交给GPU的，吞吐量是衡量性能和总拥有成本的重要指标。尽管最近改进了对并行GPU内核执行的运行时支持，但GPU可能被严重利用不足，导致吞吐量不理想。在本文中，我们提出了Kernelet，这是一种具有动态切片和调度技术的运行时系统，可以提高GPU上并发内核执行的吞吐量。通过切片，Kernelet将GPU内核划分为多个子内核（即切片）。每个切片具有可调的占用率，以允许与其他切片进行协同调度并充分利用GPU资源。我们开发了一种新颖有效的基于马尔可夫链的绩效模型来指导调度决策。我们的实验结果表明，NVIDIA Tesla C2050和GTX680 GPU的性能分别提高了31.1％和23.4％。

著录项

作者
Zhong, Jianlong; He, Bingsheng;
展开▼
作者单位

展开▼
年度 2013
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling [J] . Zhong J., He B. IEEE Transactions on Parallel and Distributed Systems . 2014,第6期

机译：Kernelet：具有动态切片和调度功能的高吞吐量GPU内核执行
2. Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU [J] . Chen Zhao, Wu Gao, Feiping Nie, Future generation computer systems . 2020,第Nova期

机译：公平和缓存阻止了GPU上的并发内核执行的意识扭曲调度
3. Using machine learning techniques to analyze the performance of concurrent kernel execution on GPUs [J] . Pablo Carvalho, Esteban Clua, Aline Paes, Future generation computer systems . 2020,第Deca期

机译：使用机器学习技术分析GPU上并发内核执行的性能
4. Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance [C] . Jiho Kim, John Kim, Yongjun Park ACM/IEEE Design Automation Conference . 2020

机译：Navigator：动态多内核调度以提高GPU性能
5. GPU Resource Optimization and Scheduling for Shared Execution Environments [D] . ?Luley, Ryan 2020

机译：共享执行环境的GPU资源优化和调度
6. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization [O] . A. Peter Ruymgaart, Ron Elber -1

机译：在CPU / GPU系统上重新定位分子动力学：水核和摇动并行化
7. 1 Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling [O] . Jianlong Zhong, Bingsheng He 2016

机译：1 Kernelet：具有动态切片和调度的高吞吐量GpU内核执行

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅