...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster
【24h】

Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster

机译:学习驱动的干扰感知工作负载并行化,用于异构群集中的流式应用

获取原文
获取原文并翻译 | 示例

摘要

In the past few years, with the rapid development of CPU-GPU heterogeneous computing, the issue of task scheduling in the heterogeneous cluster has attracted a great deal of attention. This problem becomes more challenging with the need for efficient co-execution of tasks on the GPUs. However, the uncertainty of heterogeneous cluster and the interference caused by resource contention among co-executing tasks can lead to the unbalanced use of computing resource and further cause the degradation in performance of computing platform. In this article, we propose a two-stage task scheduling approach for streaming applications based on deep reinforcement learning and neural collaborative filtering, which considers fine-grained task division and task interference on the GPU. Specifically, the Learning-Driven Workload Parallelization (LDWP) method selects an appropriate execution node for the mutually independent tasks. By using the deep Q-network, the cluster-level scheduling model is online learned to perform the current optimal scheduling actions according to the runtime status of cluster environments and characteristics of tasks. The Interference-Aware Workload Parallelization (IAWP) method assigns subtasks with dependencies to the appropriate computing units, taking into account the interference of subtasks on the GPU by using neural collaborative filtering. For making the learning of neural network more efficient, we use pre-training in the two-stage scheduler. Besides, we use transfer learning technology to efficiently rebuild task scheduling model referring to the existing model. We evaluate our learning-driven and interference-aware task scheduling approach on a prototype platform with other widely used methods. The experimental results show that the proposed strategy can averagely improve the throughout for distributed computing system by 26.9 percent and improve the GPU resource utilization by around 14.7 percent.
机译:在过去几年中,随着CPU-GPU的快速发展,异构集群中的任务调度问题引起了大量的关注。对于在GPU上的有效共同执行的需要,此问题变得更具挑战性。然而,异构集群的不确定性和由协同执行任务之间的资源争用引起的干扰可能导致计算资源的不平衡使用,并进一步导致计算平台性能的降级。在本文中,我们提出了一种基于深度加强学习和神经协作滤波的流媒体应用的两阶段任务调度方法,该方法考虑了GPU上的细粒度任务划分和任务干扰。具体地,学习驱动的工作负载并行化(LDWP)方法为相互独立的任务选择适当的执行节点。通过使用Deep Q-Network,群集调度模型在线学习,根据群集环境的运行时状态和任务特征来执行当前的最佳调度操作。干扰感知的工作负载并行化(IAWP)方法将具有依赖关系的子任务分配给适当的计算单元,考虑到通过使用神经协作滤波的子组织的干扰。为了使神经网络的学习更高效,我们在两级调度程序中使用预培训。此外,我们使用传输学习技术来指代现有模型有效地重建任务调度模型。我们以其他广泛使用的方法评估了我们在原型平台上的学习驱动和干扰感知的任务调度方法。实验结果表明,拟议的策略可以平均改善整个用于分布式计算系统的综合性,并将GPU资源利用率提高约14.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号