首页> 外文期刊>Concurrency and computation: practice and experience >Deep and reinforcement learning for automated task scheduling in large-scale cloud computing systems
【24h】

Deep and reinforcement learning for automated task scheduling in large-scale cloud computing systems

机译:大型云计算系统中自动任务调度的深度和加强学习

获取原文
获取原文并翻译 | 示例

摘要

Cloud computing is undeniably becoming the main computing and storage platform for today's major workloads. From Internet of things and Industry 4.0 workloads to big data analytics and decision-making jobs, cloud systems daily receive a massive number of tasks that need to be simultaneously and efficiently mapped onto the cloud resources. Therefore, deriving an appropriate task scheduling mechanism that can both minimize tasks' execution delay and cloud resources utilization is of prime importance. Recently, the concept of cloud automation has emerged to reduce the manual intervention and improve the resource management in large-scale cloud computing workloads. In this article, we capitalize on this concept and propose four deep and reinforcement learning-based scheduling approaches to automate the process of scheduling large-scale workloads onto cloud computing resources, while reducing both the resource consumption and task waiting time. These approaches are: reinforcement learning (RL), deep Q networks, recurrent neural network long short-term memory (RNN-LSTM), and deep reinforcement learning combined with LSTM (DRL-LSTM). Experiments conducted using real-world datasets from Google Cloud Platform revealed that DRL-LSTM outperforms the other three approaches. The experiments also showed that DRL-LSTM minimizes the CPU usage cost up to67%compared with the shortest job first (SJF), and up to35%compared with both the round robin (RR) and improved particle swarm optimization (PSO) approaches. Moreover, our DRL-LSTM solution decreases the RAM memory usage cost up to72%compared with the SJF, up to65%compared with the RR, and up to31.25%compared with the improved PSO.
机译:云计算无可否认成为当今主要工作负载的主要计算和存储平台。从物联网和行业4.0工作负载到大数据分析和决策作业,云系统每天都会获得需要同时和有效地映射到云资源的大量任务。因此,导出适当的任务调度机制,其可以最小化任务的执行延迟和云资源利用率是主要的重要性。最近,已经出现了云自动化的概念,以减少手动干预,提高大规模云计算工作负载中的资源管理。在本文中,我们利用了这一概念,提出了四种基于深度和加强的学习的调度方法,以自动将大规模工作负载安排到云计算资源的过程,同时减少资源消耗和任务等待时间。这些方法是:加固学习(RL),深Q网络,经常性神经网络长短期内存(RNN-LSTM)以及与LSTM(DRL-LSTM)相结合的深增强学习。使用来自Google云平台的实际数据集进行的实验显示DRL-LSTM优于其他三种方法。实验还显示DRL-LSTM与最短工作(SJF)相比,DRL-LSTM最小化CPU使用率高达67%,与循环(RR)和改进的粒子群优化(PSO)接近相比,高达35%。此外,与SJF相比,我们的DRL-LSTM解决方案降低了RAM内存使用量高达72%,与RR相比,高达65%,与改进的PSO相比,高达31.25%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号