首页> 外文会议>IEEE/ACM international symposium on cluster, cloud and grid computing >Collocating CPU-only Jobs with GPU-assisted Jobs on GPU-assisted HPC
【24h】

Collocating CPU-only Jobs with GPU-assisted Jobs on GPU-assisted HPC

机译:在GPU辅助的HPC上将仅CPU的作业与GPU辅助的作业并置

获取原文

摘要

In recent years, GPU has evolved rapidly and exhibited great potential in accelerating scientific applications. Massive GPU-assisted HPC systems have been deployed. However, as a heterogeneous system, GPU-assisted HPC is harder to be programmed and utilized than conventional CPU-only system. Statistics of the Keene land system indicate that the effective utilization rate of computational resources is only about 40% when the system runs in normal condition with enough jobs in its queue. Our theoretical model shows that the lack of overlap between CPU/GPU computation is a major obstacle in the efficient utilization of heterogeneous system. In this paper, we evaluate the possibility of collocating CPU-only job with GPU-assisted job on the same node to increase overlap between CPU/GPU computation, thus achieving better utilization. Several performance compromising factors, such as resource isolation, CPU load, and GPU memory demands, are studied based on workload from popular MPI/CUDA benchmarks. The results indicate that, when those factors are managed properly, the collocated CPU-only job can efficiently scavenge the underutilized CPU resource without affecting the performance of both collocated jobs. Based on this insight, an experimental system with collocation-aware job scheduler and resource manager is proposed. With our experiment workload pool of mixed CPU and GPU jobs, the system demonstrates 15% gain in throughput and 10% gain in both CPU and GPU utilization.
机译:近年来,GPU迅速发展,并在加速科学应用方面显示出巨大潜力。已经部署了大规模的GPU辅助HPC系统。但是,作为异构系统,与传统的仅CPU系统相比,GPU辅助的HPC难以编程和利用。基恩陆地系统的统计数据表明,当系统正常运行且队列中有足够的作业时,计算资源的有效利用率仅为40%左右。我们的理论模型表明,CPU / GPU计算之间缺乏重叠是有效利用异构系统的主要障碍。在本文中,我们评估了将纯CPU作业与GPU辅助作业并置在同一节点上以增加CPU / GPU计算之间的重叠,从而实现更高利用率的可能性。基于流行的MPI / CUDA基准测试的工作负载,研究了一些性能折衷因素,例如资源隔离,CPU负载和GPU内存需求。结果表明,如果适当地管理了这些因素,则并置的仅CPU作业可以有效地清除未充分利用的CPU资源,而不会影响两个并置的作业的性能。基于这种见识,提出了一个具有并置感知作业调度器和资源管理器的实验系统。通过我们的混合CPU和GPU作业的实验工作负载池,该系统显示出吞吐量提高了15%,CPU和GPU利用率都提高了10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号