In recent years, GPU has evolved rapidly and exhibited great potential in accelerating scientific applications. Massive GPU-assisted HPC systems have been deployed. However, as a heterogeneous system, GPU-assisted HPC is harder to be programmed and utilized than conventional CPU-only system. Statistics of the Keene land system indicate that the effective utilization rate of computational resources is only about 40% when the system runs in normal condition with enough jobs in its queue. Our theoretical model shows that the lack of overlap between CPU/GPU computation is a major obstacle in the efficient utilization of heterogeneous system. In this paper, we evaluate the possibility of collocating CPU-only job with GPU-assisted job on the same node to increase overlap between CPU/GPU computation, thus achieving better utilization. Several performance compromising factors, such as resource isolation, CPU load, and GPU memory demands, are studied based on workload from popular MPI/CUDA benchmarks. The results indicate that, when those factors are managed properly, the collocated CPU-only job can efficiently scavenge the underutilized CPU resource without affecting the performance of both collocated jobs. Based on this insight, an experimental system with collocation-aware job scheduler and resource manager is proposed. With our experiment workload pool of mixed CPU and GPU jobs, the system demonstrates 15% gain in throughput and 10% gain in both CPU and GPU utilization.
展开▼