An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness

Sunpyo Hong; Hyesoon Kim

首页> 外文期刊>Computer architecture news >An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness

【24h】

An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness

机译：具有内存级和线程级并行性意识的GPU架构分析模型

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more difficult. Current approaches rely on programmers to tune their applications by exploiting the design space exhaustively without fully understanding the performance characteristics of their applications.rnTo provide insights into the performance bottlenecks of parallel applications on GPU architectures, we propose a simple analytical model that estimates the execution time of massively parallel programs. The key component of our model is estimating the number of parallel memory requests (we call this the memory warp parallelism) by considering the number of running threads and memory bandwidth. Based on the degree of memory warp parallelism, the model estimates the cost of memory requests, thereby estimating the overall execution time of a program. Comparisons between the outcome of the model and the actual execution time in several GPUs show that the geometric mean of absolute error of our model on micro-benchmarks is 5.4% and on GPU computing applications is 13.3%. All the applications are written in the CUDA programming language.

机译：由于GPU架构的并行处理器数量众多，因此在多核时代变得越来越重要。对成千上万的大规模并行线程进行编程对软件工程师来说是一个很大的挑战，但是要了解这些并行程序在GPU架构上的性能瓶颈以提高应用程序性能则更加困难。当前的方法依赖于程序员在不完全了解其应用程序性能特征的情况下，通过详尽地利用设计空间来调整其应用程序。为了提供对GPU架构上并行应用程序性能瓶颈的见解，我们提出了一种简单的分析模型来估算执行时间大规模并行程序。我们模型的关键部分是通过考虑正在运行的线程数和内存带宽来估计并行内存请求的数量（我们将其称为内存扭曲并行性）。基于内存扭曲并行度，该模型可以估算内存请求的成本，从而估算程序的总体执行时间。在多个GPU中，模型的结果与实际执行时间之间的比较表明，我们的模型在微基准测试上的绝对误差的几何平均值为5.4％，在GPU计算应用程序上为13.3％。所有应用程序均以CUDA编程语言编写。

著录项

来源
《Computer architecture news》 |2009年第3期|152-163|共12页
作者
Sunpyo Hong; Hyesoon Kim;
展开▼
作者单位

Electrical and Computer Engineering Georgia Institute of Technology;

School of Computer Science Georgia Institute of Technology;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
analytical model; CUDA; GPU architecture; memory level parallelism; warp level parallelism; performance estimation;

机译：分析模型CUDA;GPU架构;内存级并行性;扭曲级并行;绩效评估;

相似文献

外文文献
中文文献
专利

1. CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs [J] . Xie Xiaolong, Liang Yun, Li Xiuhong, Fortschritte der Physik . 2018,第6期

机译：CRAT：支持GPU的协调寄存器分配和线程并行优化
2. Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors [J] . STUN EYERMAN, LIEVEN EECKHOUT ACM Transactions on Architecture and Code Optimization . 2009,第1期

机译：同步多线程处理器的内存级并行感知获取策略
3. GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP [J] . Lin Zhen, Mantor Michael, Zhou Huiyang ACM Transactions on Architecture and Code Optimization . 2018,第1期

机译：GPU性能与线程平行性：可伸缩性分析和改进TLP的新方法
4. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness [C] . Sunpyo Hong, Hyesoon Kim Proceedings of the 36th annual international symposium on Computer architecture . 2009

机译：具有内存级和线程级并行性意识的GPU架构分析模型
5. Exploiting Thread-Level Parallelism on Reconfigurable Architectures: a Cross-Layer Approach [D] . Momeni, Amir. 2017

机译：在可重构体系结构上利用线程级并行性：一种跨层方法
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness [O] . Sunpyo Hong, Hyesoon Kim 2009

机译：具有内存级和线程级并行性意识的GPU架构分析模型

An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅