...
首页> 外文期刊>Concurrency, practice and experience >Improving the management efficiency of GPU workloads in data centers through GPU virtualization
【24h】

Improving the management efficiency of GPU workloads in data centers through GPU virtualization

机译:通过GPU虚拟化提高数据中心中GPU工作负载的管理效率

获取原文
获取原文并翻译 | 示例
           

摘要

Graphics processing units (GPUs) are currently used in data centers to reduce the execution time of compute-intensive applications. However, the use of GPUs presents several side effects, such as increased acquisition costs and larger space requirements. Furthermore, GPUs require a nonnegligible amount of energy even while idle. Additionally, GPU utilization is usually low for most applications. In a similar way to the use of virtual machines, using virtual GPUs may address the concerns associated with the use of these devices. In this regard, the remote GPU virtualization mechanism could be leveraged to share the GPUs present in the computing facility among the nodes of the cluster. This would increase overall GPU utilization, thus reducing the negative impact of the increased costs mentioned before. Reducing the amount of GPUs installed in the cluster could also be possible. However, in the same way as job schedulers map GPU resources to applications, virtual GPUs should also be scheduled before job execution. Nevertheless, current job schedulers are not able to deal with virtual GPUs. In this paper, we analyze the performance attained by a cluster using the remote Compute Unified Device Architecture middleware and a modified version of the Slurm scheduler, which is now able to assign remote GPUs to jobs. Results show that cluster throughput, measured as jobs completed per time unit, is doubled at the same time that the total energy consumption is reduced up to 40%. GPU utilization is also increased.
机译:图形处理单元(GPU)目前用于数据中心以减少计算密集型应用程序的执行时间。然而,GPU的使用呈现了几个副作用,例如提高的采集成本和更大的空间要求。此外,GPU也需要闲置时的不可缩牌量的能量。此外,对于大多数应用而言,GPU利用率通常很低。以类似于使用虚拟机的方式,使用虚拟GPU可以解决与使用这些设备相关的问题。在这方面,可以利用远程GPU虚拟化机制来共享群集节点中的计算设施中存在的GPU。这将增加整体GPU利用率,从而降低了之前提到的增加的成本的负面影响。还可以降低安装在群集中的GPU的量。但是,与Job Schedulers将GPU资源映射到应用程序相同的方式,也应该在作业执行之前安排虚拟GPU。尽管如此,当前的工作计划员无法处理虚拟GPU。在本文中,我们使用远程计算统一设备架构中间件和Slurm Scheduler的修改版本分析了群集实现的性能,现在能够将远程GPU分配给作业。结果表明,作为每个时间单位完成的作业的集群吞吐量同时加倍,即总能耗降低至40%。 GPU利用率也增加。

著录项

  • 来源
    《Concurrency, practice and experience》 |2021年第2期|e5275.1-e5275.16|共16页
  • 作者单位

    Univ Jaume 1 Dept Engn & Ciencia Comp Campus Riu Sec Castellon De La Plana 12071 Spain|Univ Valencia Dept Comp Sci Valencia Spain;

    Univ Politecn Valencia Dept Informat Sistemas & Comp Valencia Spain;

    Queens Univ Belfast Sch Elect Elect Engn & Comp Sci Belfast Antrim North Ireland;

    Univ Politecn Valencia Dept Informat Sistemas & Comp Valencia Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    CUDA; data centers; GPU; InfiniBand; rCUDA; Slurm;

    机译:CUDA;数据中心;GPU;INFINIBAND;RCUDA;SLURM;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号