A framework for efficient and scalable execution of domain-specific templates on Gus

机译：在GUS上有效和可扩展的域特定模板执行的框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Units (GPUs) have emerged as important players in the transition of the computing industry from sequential to multi- and many-core computing. We propose a software framework for execution of domain-specific parallel templates on GPUs, which simultaneously raises the abstraction level of GPU programming and ensures efficient execution with forward scalability to large data sizes and new GPU platforms. To achieve scalable and efficient GPU execution, our framework focuses on two critical problems that have been largely ignored in previous efforts - processing large data sets that do not fit within the GPU memory, and minimizing data transfers between the host and GPU. Our framework takes domain-specific parallel programming templates that are expressed as parallel operator graphs, and performs operator splitting, offload unit identification, and scheduling of off-loaded computations and data transfers between the host and the GPU, to generate a highly optimized execution plan. Finally, a code generator produces a hybrid CPU/GPU program in accordance with the derived execution plan, that uses lower-level frameworks such as CUDA. We have applied the proposed framework to templates from the recognition domain, specifically edge detection kernels and convolutional neural networks that are commonly used in image and video analysis. We present results on two different GPU platforms from NVIDIA (a Tesla C870 GPU computing card and a GeForce 8800 graphics card) that demonstrate 1.7-7.8X performance improvements over already accelerated baseline GPU implementations. We also demonstrate scalability to input data sets and application memory footprints of 6GB and 17GB, respectively, on GPU platforms with only 768MB and 1.5GB of memory.

机译：图形处理单元（GPU）已成为计算行业过渡到多核计算的重要参与者。我们提出了一种在GPU上执行域特定的并行模板的软件框架，它同时提高GPU编程的抽象级别，并确保高效执行，以向大数据大小和新的GPU平台进行前进可扩展性。为了实现可扩展和高效的GPU执行，我们的框架侧重于在以前的工作中大大忽略的两个关键问题 - 处理不符合GPU内存内的大数据集，并最大限度地减少主机和GPU之间的数据传输。我们的框架采用域特定的并行编程模板，表示为并行运算符图表，并执行主机和GPU之间的关闭计算和数据传输的操作员分割，卸载单元标识和调度，以生成高度优化的执行计划。最后，代码生成器根据派生执行计划产生混合CPU / GPU程序，该计划使用诸如CUDA的较低级别的框架。我们已经将建议的框架应用于来自识别域的模板，具体地是在图像和视频分析中常用的边缘检测核和卷积神经网络。我们在NVIDIA（TESLA C870 GPU计算卡和GEForce 8800显卡）上的两种不同GPU平台上显示了结果，这些表现出1.7-7.8x的性能改进，超过已经加速的基线GPU实现。我们还展示了在仅768MB和1.5GB内存的GPU平台上分别对6GB和17GB的数据集和应用程序存储空间进行可扩展性。

著录项

来源
《International Symposium on Parallel Distributed Processing》|2009年||共12页
会议地点
作者
Narayanan Sundaram; Anand Raghunathan; Srimat T. Chakradhar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词

相似文献

外文文献
中文文献
专利

1. Advanced and efficient execution trace management for executable domain-specific modeling languages [J] . Bousse Erwan, Mayerhofer Tanja, Combemale Benoit, Software and systems modeling . 2019,第1期

机译：针对可执行域特定建模语言的高级高效执行跟踪管理
2. Impetuous search execution is postponed for the purpose of an efficient conjunction search with a coherent target template [J] . Junha Chang, Joo-Seok Hyun Journal of vision . 2014,第10期

机译：为了使连贯目标模板高效地进行联合搜索，推迟了无用搜索的执行
3. Domain-Specific Programmable Design of Scalable Streaming-Array for Power-Efficient Stencil Computation [J] . Kentaro Sano, Satoru Yamamoto, Yoshiaki Hatsuda Computer architecture news . 2011,第4期

机译：高效模板计算的可扩展流阵列的领域特定可编程设计
4. A framework for efficient and scalable execution of domain-specific templates on Gus [C] . Narayanan Sundaram, Anand Raghunathan, Srimat T. Chakradhar International Symposium on Parallel Distributed Processing . 2009

机译：在GUS上有效和可扩展的域特定模板执行的框架
5. Secure and energy efficient execution frameworks using virtualization and light-weight cryptographic components [D] . Nimgaonkar, Satyajeet 2014

机译：使用虚拟化和轻量级加密组件的安全且节能的执行框架
6. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data [O] . Goo Jun, Mary Kate Wing, Gonçalo R. Abecasis, 2015

机译：一个高效且可扩展的分析框架用于从人口规模的DNA序列数据中提取和改进变体
7. Advanced and efficient execution trace management for executable domain-specific modeling languages [O] . Bousse, Erwan, Mayerhofer, Tanja, Combemale, Benoit, 2017

机译：针对可执行域特定建模语言的高级高效执行跟踪管理
8. Framework for Efficient Execution of Array-Based Languages on SIMD Computers [R] . Prins, J. F. 1990

机译：在sImD计算机上有效执行基于数组的语言的框架

A framework for efficient and scalable execution of domain-specific templates on Gus

摘要

著录项

相似文献

相关主题

期刊订阅