Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster

Zhang Haitao; Geng Xin; Ma Huadong

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster

【24h】

Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster

机译：学习驱动的干扰感知工作负载并行化，用于异构群集中的流式应用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In the past few years, with the rapid development of CPU-GPU heterogeneous computing, the issue of task scheduling in the heterogeneous cluster has attracted a great deal of attention. This problem becomes more challenging with the need for efficient co-execution of tasks on the GPUs. However, the uncertainty of heterogeneous cluster and the interference caused by resource contention among co-executing tasks can lead to the unbalanced use of computing resource and further cause the degradation in performance of computing platform. In this article, we propose a two-stage task scheduling approach for streaming applications based on deep reinforcement learning and neural collaborative filtering, which considers fine-grained task division and task interference on the GPU. Specifically, the Learning-Driven Workload Parallelization (LDWP) method selects an appropriate execution node for the mutually independent tasks. By using the deep Q-network, the cluster-level scheduling model is online learned to perform the current optimal scheduling actions according to the runtime status of cluster environments and characteristics of tasks. The Interference-Aware Workload Parallelization (IAWP) method assigns subtasks with dependencies to the appropriate computing units, taking into account the interference of subtasks on the GPU by using neural collaborative filtering. For making the learning of neural network more efficient, we use pre-training in the two-stage scheduler. Besides, we use transfer learning technology to efficiently rebuild task scheduling model referring to the existing model. We evaluate our learning-driven and interference-aware task scheduling approach on a prototype platform with other widely used methods. The experimental results show that the proposed strategy can averagely improve the throughout for distributed computing system by 26.9 percent and improve the GPU resource utilization by around 14.7 percent.

机译：在过去几年中，随着CPU-GPU的快速发展，异构集群中的任务调度问题引起了大量的关注。对于在GPU上的有效共同执行的需要，此问题变得更具挑战性。然而，异构集群的不确定性和由协同执行任务之间的资源争用引起的干扰可能导致计算资源的不平衡使用，并进一步导致计算平台性能的降级。在本文中，我们提出了一种基于深度加强学习和神经协作滤波的流媒体应用的两阶段任务调度方法，该方法考虑了GPU上的细粒度任务划分和任务干扰。具体地，学习驱动的工作负载并行化（LDWP）方法为相互独立的任务选择适当的执行节点。通过使用Deep Q-Network，群集调度模型在线学习，根据群集环境的运行时状态和任务特征来执行当前的最佳调度操作。干扰感知的工作负载并行化（IAWP）方法将具有依赖关系的子任务分配给适当的计算单元，考虑到通过使用神经协作滤波的子组织的干扰。为了使神经网络的学习更高效，我们在两级调度程序中使用预培训。此外，我们使用传输学习技术来指代现有模型有效地重建任务调度模型。我们以其他广泛使用的方法评估了我们在原型平台上的学习驱动和干扰感知的任务调度方法。实验结果表明，拟议的策略可以平均改善整个用于分布式计算系统的综合性，并将GPU资源利用率提高约14.7％。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第1期|1-15|共15页
作者
Zhang Haitao; Geng Xin; Ma Huadong;
展开▼
作者单位

Beijing Univ Posts & Telecommun Beijing Key Lab Intelligent Telecomm Software & M Beijing 100876 Peoples R China;

Beijing Univ Posts & Telecommun Beijing Key Lab Intelligent Telecomm Software & M Beijing 100876 Peoples R China;

Beijing Univ Posts & Telecommun Beijing Key Lab Intelligent Telecomm Software & M Beijing 100876 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Parallel computing; heterogeneous computing; task scheduling; deep reinforcement learning; neural collaborative filtering; interference aware;

机译：并行计算;异构计算;任务调度;深度增强学习;神经协同过滤;干扰意识;

相似文献

外文文献
中文文献
专利

1. Bi-Objective Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Performance and Energy Through Workload Distribution [J] . Khaleghzadeh Hamidreza, Fahad Muhammad, Shahid Arsalan, IEEE Transactions on Parallel and Distributed Systems . 2021,第3期

机译：通过工作负载分布对性能和能量的异构HPC平台数据并行应用的双目标优化
2. Implementation of the CPU/GPU hybrid parallel method of characteristics neutron transport calculation using the heterogeneous cluster with dynamic workload assignment [J] . Song Peitao, Zhang Zhijian, Zhang Qian, Annals of nuclear energy . 2020,第Jana期

机译：使用具有动态工作负载分配的异构集群实现特征中子输运计算的CPU / GPU混合并行方法
3. Workload Distribution Framework for the Parallel Solution of Large Structural Models on Heterogeneous PC Clusters [J] . Ozgur Kurc Journal of Computing in Civil Engineering . 2010,第2期

机译：异构PC集群上大型结构模型并行解决方案的工作负载分配框架
4. Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Dynamic Energy Through Workload Distribution [C] . Hamidreza Khaleghzadeh, Muhammad Fahad, Ravi Reddy Manumachu, European Conference on Parallel Processing . 2019

机译：通过工作量分配在异构HPC平台上动态能源优化数据并行应用
5. Stream-Dashboard: A big data stream clustering framework with applications to social media streams. [D] . Hawwash, Basheer. 2013

机译：Stream-Dashboard：一个大数据流集群框架，其应用程序适用于社交媒体流。
6. K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity [O] . Chang Sik Kim, Martyn D. Winn, Vipin Sachdeva, 2017

机译：使用MapReduce框架的K-mer聚类算法：在Trinity的Inchworm模块并行化中的应用
7. Workload Distribution Framework for the Parallel Solution of Large Structural Models on Heterogeneous PC Clusters [O] . Ozgur Kurc 2010

机译：非均相PC集群对大型结构模型的并行解决方案的工作量分配框架
8. Using Heterogeneous High Performance Computing Cluster for Supporting Fine-Grained Parallel Applications [R] . Abu-Gazaleh, N. 2006

机译：使用异构高性能计算集群支持细粒度并行应用

Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅