Speculative execution on multi-GPU systems

机译：多GPU系统上的推测执行

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order to utilize them effectively. An accelerator is a processor with a different ISA and micro-architecture than the main CPU. These static partitioning schemes are effective when targeting a system with only a single accelerator. However, they are not robust to changes in the number of accelerators or the performance characteristics of future generations of accelerators. In previous work, we presented the Harmony execution model for computing on heterogeneous systems with several CPUs and accelerators. In this paper, we extend Harmony to target systems with multiple accelerators using control speculation to expose parallelism. We refer to this technique as Kernel Level Speculation (KLS). We argue that dynamic parallelization techniques such as KLS are sufficient to scale applications across several accelerators based on the intuition that there will be fewer distinct accelerators than cores within each accelerator. In this paper, we use a complete prototype of the Harmony runtime that we developed to explore the design decisions and trade-offs in the implementation of KLS. We show that KLS improves parallelism to a sufficient degree while retaining a sequential programming model. We accomplish this by demonstrating good scaling of KLS on a highly heterogeneous system with three distinct accelerator types and ten processors.

机译：并行编程模型和语言落后于异构多核处理器的发展，这在现代系统的计算能力和应用程序利用它们的能力之间留下了空白。诸如CUDA和OpenCL之类的新兴编程模型迫使开发人员将应用程序明确划分为组件（内核），并将其分配给加速器，以便有效地利用它们。加速器是具有与主CPU不同的ISA和微体系结构的处理器。当针对仅具有单个加速器的系统时，这些静态分区方案非常有效。但是，它们对于加速器数量的变化或下一代加速器的性能特征并不强大。在先前的工作中，我们介绍了用于在具有多个CPU和加速器的异构系统上进行计算的Harmony执行模型。在本文中，我们使用控制推测来揭示并行性，将Harmony扩展到具有多个加速器的目标系统。我们将此技术称为内核级别推测（KLS）。我们认为，基于这样的直觉，即每个加速器中的独立加速器比核心少，因此，诸如KLS之类的动态并行化技术足以在多个加速器上扩展应用程序。在本文中，我们使用了我们开发的Harmony运行时的完整原型，以探索KLS实施中的设计决策和权衡取舍。我们显示KLS在保持顺序编程模型的同时，将并行度提高到足够的程度。我们通过在具有三种不同加速器类型和十个处理器的高度异构系统上演示KLS的良好缩放来实现此目的。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|p.1-12|共12页
会议地点 Atlanta GA(US)
作者
Diamos G.; Yalamanchili S.;
展开▼
作者单位

Sch. of Electr. Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. GPU-SAM: Leveraging multi-GPU split-and-merge execution for system-wide real-time support [J] . Wookhyun Han, Hoon Sung Chwa, Hwidong Bae, The Journal of Systems and Software . 2016,第jula期

机译：GPU-SAM：利用多GPU的拆分和合并执行来获得系统范围的实时支持
2. The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution [J] . Mohamed Zahran Computing reviews . 2014,第7期

机译：节能的推测线程执行的异构多核系统的设计和实现
3. The Design and Implementation of Heterogeneous Multicore Systems for Energy-efficient Speculative Thread Execution [J] . YANGCHUN LUO, WEI-CHUNG HSU, ANTONIA ZHAI ACM Transactions on Architecture and Code Optimization . 2013,第4期

机译：高能效推测线程执行的异构多核系统的设计与实现
4. Speculative execution on multi-GPU systems [C] . Diamos Gregory, Yalamanchili Sudhakar 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：多GPU系统上的推测执行
5. Modeling execution and predicting performance in multi-GPU environments. [D] . Schaa, Dana. 2009

机译：在多GPU环境中建模执行并预测性能。
6. NMF-mGPU: non-negative matrix factorization on multi-GPU systems [O] . Edgardo Mejía-Roa, Daniel Tabas-Madrid, Javier Setoain, 2015

机译：NMF-mGPU：多GPU系统上的非负矩阵分解
7. Speculative Execution on Multi-GPU Systems [O] . Gregory Diamos, Sudhakar Yalamanchili 2010

机译：多GpU系统的推测执行

Speculative execution on multi-GPU systems

摘要

著录项

相似文献

相关主题

期刊订阅