【24h】

Speculative execution on multi-GPU systems

机译:多GPU系统上的推测执行

获取原文
获取原文并翻译 | 示例

摘要

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order to utilize them effectively. An accelerator is a processor with a different ISA and micro-architecture than the main CPU. These static partitioning schemes are effective when targeting a system with only a single accelerator. However, they are not robust to changes in the number of accelerators or the performance characteristics of future generations of accelerators. In previous work, we presented the Harmony execution model for computing on heterogeneous systems with several CPUs and accelerators. In this paper, we extend Harmony to target systems with multiple accelerators using control speculation to expose parallelism. We refer to this technique as Kernel Level Speculation (KLS). We argue that dynamic parallelization techniques such as KLS are sufficient to scale applications across several accelerators based on the intuition that there will be fewer distinct accelerators than cores within each accelerator. In this paper, we use a complete prototype of the Harmony runtime that we developed to explore the design decisions and trade-offs in the implementation of KLS. We show that KLS improves parallelism to a sufficient degree while retaining a sequential programming model. We accomplish this by demonstrating good scaling of KLS on a highly heterogeneous system with three distinct accelerator types and ten processors.
机译:并行编程模型和语言落后于异构多核处理器的发展,这在现代系统的计算能力和应用程序利用它们的能力之间留下了空白。诸如CUDA和OpenCL之类的新兴编程模型迫使开发人员将应用程序明确划分为组件(内核),并将其分配给加速器,以便有效地利用它们。加速器是具有与主CPU不同的ISA和微体系结构的处理器。当针对仅具有单个加速器的系统时,这些静态分区方案非常有效。但是,它们对于加速器数量的变化或下一代加速器的性能特征并不强大。在先前的工作中,我们介绍了用于在具有多个CPU和加速器的异构系统上进行计算的Harmony执行模型。在本文中,我们使用控制推测来揭示并行性,将Harmony扩展到具有多个加速器的目标系统。我们将此技术称为内核级别推测(KLS)。我们认为,基于这样的直觉,即每个加速器中的独立加速器比核心少,因此,诸如KLS之类的动态并行化技术足以在多个加速器上扩展应用程序。在本文中,我们使用了我们开发的Harmony运行时的完整原型,以探索KLS实施中的设计决策和权衡取舍。我们显示KLS在保持顺序编程模型的同时,将并行度提高到足够的程度。我们通过在具有三种不同加速器类型和十个处理器的高度异构系统上演示KLS的良好缩放来实现此目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号