Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations

机译：编译器和运行时支持，用于在异构并行配置上启用广义归约计算

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a GPU. Capitalizing on the maximum computational power of such architectures (i.e., by simultaneously exploiting both the multi-core CPU and the GPU) starting from a high-level API is a critical challenge. We believe that it would be highly desirable to support a simple way for programmers to realize the full potential of today's heterogeneous machines.rnThis paper describes a compiler and runtime framework that can map a class of applications, namely those characterized by generalized reductions, to a system with a multi-core CPU and GPU. Starting with simple C functions with added annotations, we automatically generate the middleware API code for the multi-core, as well as CUDA code to exploit the GPU simultaneously. The runtime system provides efficient schemes for dynamically partitioning the work between CPU cores and the GPU. Our experimental results from two applications, e.g., k-means clustering and Principal Component Analysis (PCA), show that, through effectively harnessing the heterogeneous architecture, we can achieve significantly higher performance compared to using only the GPU or the multi-core CPU. In k-means, the heterogeneous version with 8 CPU cores and a GPU achieved a speedup of about 32.09x relative to 1-thread CPU. When compared to the faster of CPU-only and GPU-only executions, we were able to achieve a performance gain of about 60%. In PCA, the heterogeneous version attained a speedup of 10.4x relative to the 1-thread CPU version. When compared to the faster of CPU-only and GPU-only versions, we achieved a performance gain of about 63.8%.

机译：越来越多样化的计算平台已经成为一种趋势，并且引起了人们的广泛关注。当前，台式机或笔记本计算机同时配备多核CPU和GPU已变得非常普遍。从高级API开始利用此类架构的最大计算能力（即，通过同时利用多核CPU和GPU）是一项严峻的挑战。我们认为，非常需要为程序员提供一种简单的方法，以实现当今异构机器的全部潜能。本文描述了一种编译器和运行时框架，该框架可以将一类应用程序（即以广义归约为特征的应用程序）映射到一个应用程序。带有多核CPU和GPU的系统。从带有添加注释的简单C函数开始，我们会自动生成用于多核的中间件API代码以及CUDA代码以同时利用GPU。运行时系统提供了有效的方案，用于在CPU内核和GPU之间动态划分工作。我们从k-means聚类和主成分分析（PCA）这两个应用程序获得的实验结果表明，与仅使用GPU或多核CPU相比，通过有效利用异构体系结构，我们可以获得显着更高的性能。在k均值中，具有8个CPU内核和GPU的异构版本相对于1线程CPU实现了约32.09倍的加速。与仅执行CPU和仅执行GPU的速度相比，我们能够实现约60％的性能提升。在PCA中，异构版本相对于1线程CPU版本达到了10.4倍的加速。与仅使用CPU和仅使用GPU的版本相比，我们获得了约63.8％的性能提升。

著录项

来源
《24th ACM international conference on supercomputing 2010》|2010年|p.137-146|共10页
会议地点 Amsterdam(NL);Amsterdam(NL)
作者
Vignesh T. Ravi; Wenjing Ma; David Chiu; Gagan Agrawal;
展开▼
作者单位

Department of Computer Science and Engineering The Ohio State University Columbus OH 43210;

rnDepartment of Computer Science and Engineering The Ohio State University Columbus OH 43210;

rnDepartment of Computer Science and Engineering The Ohio State University Columbus OH 43210;

rnDepartment of Computer Science and Engineering The Ohio State University Columbus OH 43210;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
generalized reductions; dynamic work distribution; multi-cores; GPGPU;

机译：普遍减少；动态工作分配；多核通用图形处理器;

相似文献

外文文献
中文文献
专利

1. Compiler and runtime support for enabling reduction computations on heterogeneous systems [J] . Vignesh T. Ravi, Wenjing Ma, David Chiu, Concurrency and computation: practice and experience . 2012,第5期

机译：编译器和运行时支持，用于在异构系统上启用约简计算
2. Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support [J] . Kyle C. Hale, Peter A. Dinda ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2016,第7期

机译：通过内核和虚拟化支持启用混合并行运行时间
3. Latency reduction from runtime-interference to the parallel Quantum Chemistry program GREMLIN in heterogeneous and homogeneous environments [J] . Siegfried Hoefinger Future generation computer systems . 2003,第5期

机译：在异构和同质环境中，从运行时干扰到并行量子化学程序GREMLIN的延迟减少
4. Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations [C] . Vignesh T. Ravi, Wenjing Ma, David Chiu, ACM international conference on supercomputing . 2010

机译：编译器和运行时支持在异构并行配置上启用通用减少计算
5. Compiler and Runtime Support for Heterogeneous Programming [D] . ?Clarkson, James 2019

机译：用于异构编程的编译器和运行时支持
6. Clinical Assessment of Standard and Generalized Autocalibrating Partially Parallel Acquisition Diffusion Imaging: Effects of Reduction Factor and Spatial Resolution [O] . J.B. Andre, G. Zaharchuk, N.J. Fischbein, -1

机译：标准和广义自校准的临床评估部分并行采集弥散成像：因子分析与空间分辨率的影响。
7. Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms [O] . Xiaogang Li, Ruoming Jin, Gagan Agrawal 2002

机译：编译器和运行时支持数据挖掘算法的共享内存并行化

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations

摘要

著录项

相似文献

相关主题

期刊订阅