【24h】

Speculative execution on multi-GPU systems

机译:多GPU系统上的推测执行

获取原文

摘要

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order to utilize them effectively. An accelerator is a processor with a different ISA and micro-architecture than the main CPU. These static partitioning schemes are effective when targeting a system with only a single accelerator. However, they are not robust to changes in the number of accelerators or the performance characteristics of future generations of accelerators. In previous work, we presented the Harmony execution model for computing on heterogeneous systems with several CPUs and accelerators. In this paper, we extend Harmony to target systems with multiple accelerators using control speculation to expose parallelism. We refer to this technique as Kernel Level Speculation (KLS). We argue that dynamic parallelization techniques such as KLS are sufficient to scale applications across several accelerators based on the intuition that there will be fewer distinct accelerators than cores within each accelerator. In this paper, we use a complete prototype of the Harmony runtime that we developed to explore the design decisions and trade-offs in the implementation of KLS. We show that KLS improves parallelism to a sufficient degree while retaining a sequential programming model. We accomplish this by demonstrating good scaling of KLS on a highly heterogeneous system with three distinct accelerator types and ten processors.
机译:异常的许多核心处理器的前进的并行编程模型和语言的滞后留下了现代系统的计算能力与应用程序利用它们的能力之间存在差距。新兴编程模型,如CUDA和OpenCL,强制开发人员将应用程序显式分区应用于组件(内核)并将其分配给加速器,以便有效地利用它们。加速器是具有不同ISA和微型结构的处理器,而不是主CPU。这些静态分区方案在瞄准仅单个加速器的系统时是有效的。然而,它们对加速器数量或后代加速器的性能特征的变化并不强大。在以前的工作中,我们介绍了具有多个CPU和加速器的异构系统计算的和谐执行模型。在本文中,我们将使用控制猜测的多个加速器的目标系统扩展到靶系统,以暴露并行性。我们将该技术称为内核级别猜测(KLS)。我们认为,基于每次加速器内的核心的直觉,kls等动态的并行化技术足以缩放跨多个加速器的应用程序。在本文中,我们使用了与我们开发的和谐运行时的完整原型,以探索KLS实施的设计决策和权衡。我们表明KLS将平行性提高到足够的程度,同时保留连续编程模型。我们通过在具有三种不同的加速器类型和十种处理器的高度异构系统上展示KLS的良好缩放来实现这一点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号