Co-scheML: Interference-aware Container Co-scheduling Scheme Using Machine Learning Application Profiles for GPU Clusters

机译：Co-scheML：使用针对GPU集群的机器学习应用程序配置文件的可感知干扰的容器协同调度方案

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, efficient execution of applications on Graphic Processing Unit(GPU) has emerged as a research topic to increase overall system throughput in cluster environment. As a current cluster orchestration platform using GPUs only supports an exclusive execution of an application on a GPU, the platform may not utilize resource of GPUs fully relying on application characteristics. Nonetheless, co-execution of GPU applications leads to interference coming from resource contention among applications. If diverse resource usage characteristics of GPU applications are not deliberated, unbalanced usage of computing resources and performance degradation could be induced in a GPU cluster. This study introduces Co-scheML for co-execution of various GPU applications such as High Performance Computing (HPC), Deep Learning (DL) Training, and DL Inference. Interference model is constructed by applying Machine Learning (ML) model with GPU metrics since predicting interference has a difficulty. Predicted interference is utilized and deployment of an application is determined by Co-scheML scheduler. Experimental results of the Co-ScheML strategy show that average job completion time is improved by 23%, and the makespan is shortened by 22% in average, as compared to baseline schedulers.

机译：最近，有效地在图形处理单元（GPU）上的应用程序被出现为一个研究主题，以提高集群环境中的整体系统吞吐量。作为使用GPU的当前群集编排平台仅支持GPU上的应用程序的独占执行，该平台可能无法利用GPU的资源完全依赖于应用特征。尽管如此，GPU应用程序的共同执行导致来自应用之间的资源争用的干扰。如果GPU应用程序的各种资源使用特性不刻意，则可以在GPU集群中引起计算资源和性能下降的不平衡使用。本研究介绍了共同执行用于共同执行各种GPU应用，例如高性能计算（HPC），深度学习（DL）训练和DL推断。通过使用GPU度量的机器学习（ML）模型来构造干扰模型，因为预测干扰具有难度。利用预测干扰，并通过Co-Scheml调度程序确定应用程序的部署。与基线调度率相比，平均工作完成时间提高了23％，平均工作完成时间提高了23％，平均缩短了22％。

著录项

来源
《IEEE International Conference on Cluster Computing》|2020年|104-108|共5页
会议地点
作者
Sejin Kim; Yoonhee Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
GPU applications; interference; co-execution; Co-scheML; resource contention; GPU utilization;

机译：GPU应用;干扰;共执行; Co-scheML;资源争用; GPU利用率;

相似文献

外文文献
中文文献
专利

1. Tracing and Profiling Machine Learning Dataflow Applications on GPU [J] . Zins Pierre, Dagenais Michel International journal of parallel programming . 2019,第5a6期

机译：在GPU上跟踪和分析机器学习数据流应用程序
2. Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster [J] . Zhang Haitao, Geng Xin, Ma Huadong IEEE Transactions on Parallel and Distributed Systems . 2021,第1期

机译：学习驱动的干扰感知工作负载并行化，用于异构群集中的流式应用
3. A clustering-based sales forecasting scheme by using extreme learning machine and ensembling linkage methods with applications to computer server [J] . Chi-Jie Lu, Ling-Jing Kao Engineering Applications of Artificial Intelligence . 2016,第octa期

机译：通过使用极限学习机和将链接方法与计算机服务器上的应用程序结合起来的基于集群的销售预测方案
4. Toward Interference-aware GPU Container Co-scheduling Learning from Application Profiles [C] . Sejin Kim, Yoonhee Kim IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion . 2020

机译：从应用程序配置文件走向可感知干扰的GPU容器协同调度学习
5. Automatic transformation and optimization of applications on GPUs and GPU clusters. [D] . Ma, Wenjing. 2011

机译：在GPU和GPU群集上自动转换和优化应用程序。
6. Machine Learning-Based Human Recognition Scheme Using a Doppler Radar Sensor for In-Vehicle Applications [O] . Eugin Hyun, Young-Seok Jin, Jae-Hyun Park, 2020

机译：基于机器学习的人力识别方案使用多普勒雷达传感器用于车载应用
7. XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs [O] . Cheng Li, Abdul Dakkak, Jinjun Xiong, 2020

机译：XSP：在GPU上的机器学习模型的堆叠分析和分析

Co-scheML: Interference-aware Container Co-scheduling Scheme Using Machine Learning Application Profiles for GPU Clusters

摘要

著录项

相似文献

相关主题

期刊订阅