ProSteal: A Proactive Work Stealer for Bulk Synchronous Tasks Distributed on a Cluster of Heterogeneous Machines with Multiple Accelerators

机译：ProSteal：针对批量同步任务的主动工作窃取器，分布在具有多个加速器的异构机器集群上

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Work stealing is an effective load balancing technique in shared memory parallel programming. However, in a distributed setup researchers have pointed out difficulties in termination detection and in sustaining a healthy steal success rate. Keeping unsuccessful steal attempts to a minimum is especially important with many-core accelerators (having specialized engines for data copy-in and copy-out), as this not only ensures that the accelerators (or GPUs) are busy but these copy engines are also working in parallel. A steal attempt by a GPU may dry up one or more stages in this pipeline of copy and execution engines. In a cluster environment, similar problem happens with the pipeline that overlaps remote data transfers with local computations. In this paper, we study the loss in compute-communication overlap as a result of work stealing. We also present a proactive stealing approach that recovers the lost overlap by re-gaining it at the stealer's end. We evaluate our technique over Unicorn, a framework that decomposes bulk synchronous computations over a cluster of nodes equipped with multiple CPUs and GPUs. As compared to conventional random victim selection with half steal strategy, our approach achieves a performance gain of 3.19x while convolving a 4 GB image with a 31*31 filter and 1.34x while multiplying two square matrices of one billion elements each over a 10-node cluster with 120 CPUs and 20 GPUs.

机译：在共享内存并行编程中，工作窃取是一种有效的负载平衡技术。但是，在分布式设置中，研究人员指出了终止检测和维持正常的窃取成功率方面的困难。对于多核加速器（具有专用于数据复制和复制的专用引擎），将不成功的窃取尝试降至最低尤为重要，因为这不仅可以确保加速器（或GPU）繁忙，而且这些复制引擎也可以并行工作。 GPU的窃取尝试可能会使复制和执行引擎这一流水线中的一个或多个阶段枯竭。在集群环境中，类似的问题发生在管道上，该管道将远程数据传输与本地计算重叠。在本文中，我们研究了由于窃取工作而造成的计算通信重叠的损失。我们还提出了一种主动的窃取方法，可以通过在窃取者端重新获得丢失的重叠来恢复丢失的重叠。我们通过Unicorn评估我们的技术，Unicorn是一个框架，该框架分解了配备有多个CPU和GPU的节点集群上的批量同步计算。与采用半窃取策略的常规随机受害者选择相比，我们的方法将31GB的31 * 31滤波器与4 GB的图像卷积在一起时，性能提升为3.19倍；而在10-具有120个CPU和20个GPU的节点集群。

著录项

来源
《2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops》|2015年|17-26|共10页
会议地点 Hyderabad(IN)
作者
Beri Tarun; Bansal Sorav; Kumar Subodh;
展开▼
作者单位

Indian Inst. of Technol. Delhi, New Delhi, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Heterogeneous Architectures; High Performance Computing; Hybrid CPU-GPU Clusters; Multi Scheduling; Work Stealing;

机译：异构体系结构;高性能计算; CPU-GPU混合集群;多调度;工作窃取;

相似文献

外文文献
专利

1. A clustering-based approach to static scheduling of multiple workflows with soft deadlines in heterogeneous distributed systems [J] . Klavdiya Bochenina, Nikolay Butakov, Alexey Dukhanov, Procedia Computer Science . 2015,第1期

机译：基于群集的异构分布式系统中具有软期限的多个工作流的静态调度方法
2. A clustering-based approach to static scheduling of multiple workflows with soft deadlines in heterogeneous distributed systems [J] . Klavdiya Bochenina, Nikolay Butakov, Alexey Dukhanov, Procedia Computer Science . 2015,第1期

机译：基于群集的异构分布式系统中具有软期限的多个工作流的静态调度方法
3. Programming Framework for Clusters with Heterogeneous Accelerators [J] . Kuen Hung Tsoi, Anson H. T. Tse, Peter Pietzuch, Computer architecture news . 2010,第4期

机译：具有异构加速器的集群的编程框架
4. ProSteal: A proactive work stealer for bulk synchronous tasks distributed on a cluster of heterogeneous machines with multiple accelerators [C] . Tarun Beri, Sorav Bansal, Subodh Kumar IEEE International Parallel and Distributed Processing Symposium Workshops . 2015

机译：普罗斯特：一个主动工作偷窃，用于分布在具有多个加速器的异构机器集群上的散装同步任务
5. Cooperative network clustering and task allocation for heterogeneous small satellite network. [D] . Qin, Jing. 2015

机译：异构小型卫星网络的协作网络聚类和任务分配。
6. A Novel Energy-Aware Distributed Clustering Algorithm for Heterogeneous Wireless Sensor Networks in the Mobile Environment [O] . Ying Gao, Chris Hadri Wkram, Jiajie Duan, 2015

机译：移动环境中异构无线传感器网络的一种新型能量感知分布式聚类算法
7. StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators [O] . Augonnet, Cédric, Aumage, Olivier, Furmento, Nathalie, 2012

机译：StarPU-MPI：通过加速器增强的机器集群上的任务编程

ProSteal: A Proactive Work Stealer for Bulk Synchronous Tasks Distributed on a Cluster of Heterogeneous Machines with Multiple Accelerators

摘要

著录项

相似文献

相关主题

期刊订阅