【24h】

A Data-Oriented Method for Scheduling Dependent Tasks on High-Density Multi-GPU Systems

机译:一种面向数据的高密度多GPU系统上依赖任务的调度方法

获取原文
获取原文并翻译 | 示例

摘要

The rapidly-changing computer architectures, though improving the performance of computers, have been challenging the programming environments for efficiently harnessing the potential of novel architectures. In this area, though the high-density multi-GPU architecture enabled unparalleled performance advantage of dense GPUs in a single server, it has increased the difficulty for scheduling diversified and dependent tasks. We therefore propose a data-oriented method for scheduling dependent tasks for this architecture while providing its implementation. In our method, we model a parallel program as a collection of data-dependent tasks for which data dependencies are managed by an expressive matrix. Accordingly, we develop a hierarchical scheduler infrastructure for our model. In this, a top scheduler is built for querying the data-dependency matrix; three downstream schedulers for queuing computation tasks that are exclusively assigned to processor, accelerator or either; and a multitude of bottom schedulers each for providing a processing element with assigned tasks. We experiment our scheduler for examples of Strassen matrix multiplication and Cholesky matrix inversion algorithms on a computer that has 8 Tesla K40 GPUs. The results show that our method is capable of offering the efficient task parallelism while fulfilling the complex task dependencies. When advanced task-oriented schedulers have been widely designed for distributed systems, a lightweight data-driven scheduler could be an alternative and handy approach that can handle the dependent yet diversified tasks of data-intensive applications for the novel high-density multi-accelerator system.
机译:迅速变化的计算机体系结构尽管提高了计算机的性能,但一直在挑战编程环境以有效利用新型体系结构的潜力。在这一领域,尽管高密度的多GPU架构在单个服务器中实现了密集GPU的无与伦比的性能优势,但它增加了调度多样化和相关任务的难度。因此,我们提出了一种面向数据的方法,用于为该体系结构安排相关任务,同时提供其实现。在我们的方法中,我们将并行程序建模为数据相关任务的集合,数据相关任务由表达矩阵管理。因此,我们为模型开发了一个分层的调度程序基础结构。在此,构建了一个顶部调度程序来查询数据依赖矩阵。三个下游调度程序,用于排队专门分配给处理器,加速器或两者之一的计算任务;以及多个底部调度器,每个调度器用于向处理元件提供分配的任务。我们在装有8个Tesla K40 GPU的计算机上对Strassen矩阵乘法和Cholesky矩阵求逆算法的示例进行了实验。结果表明,我们的方法能够提供有效的任务并行性,同时满足复杂的任务依赖性。当高级面向任务的调度程序已广泛设计用于分布式系统时,轻量级数据驱动的调度程序可能是一种替代且便捷的方法,可以处理新型高密度多加速器系统中数据密集型应用程序的依赖而又多样化的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号