...
首页> 外文期刊>Annals of the American Thoracic Society >Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution
【24h】

Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution

机译:协调CTA组合和带宽分区,用于GPU并发内核执行

获取原文
获取原文并翻译 | 示例

摘要

Contemporary GPUs support multiple kernels to run concurrently on the same streaming multiprocessors (SMs). Recent studies have demonstrated that such concurrent kernel execution (CKE) improves both resource utilization and computational throughput. Most of the prior works focus on partitioning the GPU resources at the cooperative thread array (CTA) level or the warp scheduler level to improve CKE. However, significant performance slowdown and unfairness are observed when latency-sensitive kernels co-run with bandwidth-intensive ones. The reason is that bandwidth over-subscription from bandwidth-intensive kernels leads to much aggravated memory access latency, which is highly detrimental to latency-sensitive kernels. Even among bandwidth-intensive kernels, more intensive kernels may unfairly consume much higher bandwidth than less-intensive ones.
机译:当代GPU支持多个内核在同一流式多处理器(SMS)上同时运行。 最近的研究表明,这种并发内核执行(CKE)提高了资源利用率和计算吞吐量。 最先前的工作中的大多数都侧重于在协作线程阵列(CTA)级别或WARP调度程序级别的GPU资源来改进CKE。 然而,当延迟敏感的内核共同运行带宽密集型的核心时,观察到显着的性能放缓和不公平。 原因是带宽密集内核的带宽过度订阅导致大量恶化的内存访问延迟,这对延迟敏感内核非常有害。 即使在带宽密集的内核中,更多的密集内核也可能不公平地消耗比较小密集型的带宽更高。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号