Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems

机译：用于将流应用程序映射到多GPU系统的可扩展框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics processing units leverage on a large array of parallel processing cores to boost the performance of a specific streaming computation pattern frequently found in graphics applications. Unfortunately, while many other general purpose applications do exhibit the required streaming behavior, they also possess unfavorable data layout and poor computation-to-communication ratios that penalize any straight-forward execution on the GPU. In this paper we describe an efficient and scalable code generation framework that can map general purpose streaming applications onto a multi-GPU system. This framework spans the entire core and memory hierarchy exposed by the multi-GPU system. Several key features in our framework ensure the scalability required by complex streaming applications. First, we propose an efficient stream graph partitioning algorithm that partitions the complex application to achieve the best performance under a given shared memory constraint. Next, the resulting partitions are mapped to multiple GPUs using an efficient architecture-driven strategy. The mapping balances the workload while considering the communication overhead. Finally, a highly effective pipeline execution is employed for the execution of the partitions on the multi-GPU system. The framework has been implemented as a back-end of the Streamlt programming language compiler. Our comprehensive experiments show its scalability and significant performance speedup compared with a previous state-of-the-art solution.

机译：图形处理单元利用大量并行处理核心，以提高图形应用程序中常见的特定流计算模式的性能。不幸的是，尽管许多其他通用应用程序确实表现出所需的流传输行为，但它们还具有不利的数据布局和较差的计算通信比，这不利于GPU上的任何直接执行。在本文中，我们描述了一种高效且可扩展的代码生成框架，该框架可将通用流应用程序映射到多GPU系统上。该框架涵盖了多GPU系统公开的整个核心和内存层次结构。我们框架中的几个关键功能确保了复杂流应用程序所需的可伸缩性。首先，我们提出了一种有效的流图分区算法，该算法对复杂的应用程序进行分区，以在给定的共享内存约束下实现最佳性能。接下来，使用高效的架构驱动策略将生成的分区映射到多个GPU。映射在考虑通信开销的同时平衡了工作负载。最后，在多GPU系统上执行高效的流水线执行以执行分区。该框架已实现为Streamlt编程语言编译器的后端。与以前的最新解决方案相比，我们全面的实验表明其可扩展性和显着的性能提升。

著录项

来源
《ACM SIGPLAN symposium on principles and practice of parallel programming》|2012年|1-10|共10页
会议地点 New Orleans LA(US)
作者
Huynh Phung Huynh; Andrei Hagiescu; Weng-Fai Wong; Rick Siow Mong Goh;
展开▼
作者单位

A*STAR Institute of High Performance Computing Singapore;

School of Computing National University of Singapore Singapore;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algorithms; Performance; Design;

机译：算法；性能;设计;

相似文献

外文文献
中文文献
专利

1. Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems [J] . Huynh Phung Huynh, Andrei Hagiescu, Weng-Fai Wong, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第8期

机译：用于将流应用程序映射到多GPU系统的可扩展框架
2. GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems [J] . Fumihiko INO, Shinta NAKAGAWA, Kenichi HAGIHARA IEICE transactions on information and systems . 2013,第12期

机译：GPU-Chariot：用于在多GPU系统上运行的流应用程序的编程框架
3. GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems [J] . Fumihiko INO, Shinta NAKAGAWA, Kenichi HAGIHARA IEICE Transactions on Information and Systems . 2013,第12期

机译：GPU-Chariot：用于在多GPU系统上运行的流应用程序的编程框架
4. Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems [C] . Huynh Phung Huynh, Andrei Hagiescu, Weng-Fai Wong, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . 2012

机译：用于将流媒体应用映射到多GPU系统上的可扩展框架
5. A Memory-Aware Scheduling Framework for Streaming Applications on Multicore Systems [D] . Ma, Mingze. 2019

机译：用于多核系统上的流应用程序的内存感知调度框架
6. Deep-Framework: A Distributed Scalable and Edge-Oriented Framework for Real-Time Analysis of Video Streams [O] . Alessandro Sassu, Jose Francisco Saenz-Cogollo, Maurizio Agelli 2021

机译：深度框架：用于视频流的实时分析的分布式可扩展和边缘导向框架
7. Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems [O] . Huynh Phung Huynh, Andrei Hagiescu, Weng-fai Wong, 2013

机译：用于将流应用程序映射到多GpU系统的可扩展框架

Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems

摘要

著录项

相似文献

相关主题

期刊订阅