Speculative execution on multi-GPU systems

机译：多GPU系统上的推测执行

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order to utilize them effectively. An accelerator is a processor with a different ISA and micro-architecture than the main CPU. These static partitioning schemes are effective when targeting a system with only a single accelerator. However, they are not robust to changes in the number of accelerators or the performance characteristics of future generations of accelerators. In previous work, we presented the Harmony execution model for computing on heterogeneous systems with several CPUs and accelerators. In this paper, we extend Harmony to target systems with multiple accelerators using control speculation to expose parallelism. We refer to this technique as Kernel Level Speculation (KLS). We argue that dynamic parallelization techniques such as KLS are sufficient to scale applications across several accelerators based on the intuition that there will be fewer distinct accelerators than cores within each accelerator. In this paper, we use a complete prototype of the Harmony runtime that we developed to explore the design decisions and trade-offs in the implementation of KLS. We show that KLS improves parallelism to a sufficient degree while retaining a sequential programming model. We accomplish this by demonstrating good scaling of KLS on a highly heterogeneous system with three distinct accelerator types and ten processors.

机译：异常的许多核心处理器的前进的并行编程模型和语言的滞后留下了现代系统的计算能力与应用程序利用它们的能力之间存在差距。新兴编程模型，如CUDA和OpenCL，强制开发人员将应用程序显式分区应用于组件（内核）并将其分配给加速器，以便有效地利用它们。加速器是具有不同ISA和微型结构的处理器，而不是主CPU。这些静态分区方案在瞄准仅单个加速器的系统时是有效的。然而，它们对加速器数量或后代加速器的性能特征的变化并不强大。在以前的工作中，我们介绍了具有多个CPU和加速器的异构系统计算的和谐执行模型。在本文中，我们将使用控制猜测的多个加速器的目标系统扩展到靶系统，以暴露并行性。我们将该技术称为内核级别猜测（KLS）。我们认为，基于每次加速器内的核心的直觉，kls等动态的并行化技术足以缩放跨多个加速器的应用程序。在本文中，我们使用了与我们开发的和谐运行时的完整原型，以探索KLS实施的设计决策和权衡。我们表明KLS将平行性提高到足够的程度，同时保留连续编程模型。我们通过在具有三种不同的加速器类型和十种处理器的高度异构系统上展示KLS的良好缩放来实现这一点。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共12页
会议地点
作者
Diamos G.; Yalamanchili S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词

相似文献

外文文献
中文文献
专利

1. GPU-SAM: Leveraging multi-GPU split-and-merge execution for system-wide real-time support [J] . Wookhyun Han, Hoon Sung Chwa, Hwidong Bae, The Journal of Systems and Software . 2016,第jula期

机译：GPU-SAM：利用多GPU的拆分和合并执行来获得系统范围的实时支持
2. The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution [J] . Mohamed Zahran Computing reviews . 2014,第7期

机译：节能的推测线程执行的异构多核系统的设计和实现
3. The Design and Implementation of Heterogeneous Multicore Systems for Energy-efficient Speculative Thread Execution [J] . YANGCHUN LUO, WEI-CHUNG HSU, ANTONIA ZHAI ACM Transactions on Architecture and Code Optimization . 2013,第4期

机译：高能效推测线程执行的异构多核系统的设计与实现
4. Speculative execution on multi-GPU systems [C] . Diamos G., Yalamanchili S. 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：多GPU系统上的推测执行
5. Modeling execution and predicting performance in multi-GPU environments. [D] . Schaa, Dana. 2009

机译：在多GPU环境中建模执行并预测性能。
6. NMF-mGPU: non-negative matrix factorization on multi-GPU systems [O] . Edgardo Mejía-Roa, Daniel Tabas-Madrid, Javier Setoain, 2015

机译：NMF-mGPU：多GPU系统上的非负矩阵分解
7. Speculative Execution on Multi-GPU Systems [O] . Gregory Diamos, Sudhakar Yalamanchili 2010

机译：多GpU系统的推测执行

Speculative execution on multi-GPU systems

摘要

著录项

相似文献

相关主题

期刊订阅