首页> 外文期刊>Computer architecture news >An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness
【24h】

An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness

机译:具有内存级和线程级并行性意识的GPU架构分析模型

获取原文
获取原文并翻译 | 示例

摘要

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more difficult. Current approaches rely on programmers to tune their applications by exploiting the design space exhaustively without fully understanding the performance characteristics of their applications.rnTo provide insights into the performance bottlenecks of parallel applications on GPU architectures, we propose a simple analytical model that estimates the execution time of massively parallel programs. The key component of our model is estimating the number of parallel memory requests (we call this the memory warp parallelism) by considering the number of running threads and memory bandwidth. Based on the degree of memory warp parallelism, the model estimates the cost of memory requests, thereby estimating the overall execution time of a program. Comparisons between the outcome of the model and the actual execution time in several GPUs show that the geometric mean of absolute error of our model on micro-benchmarks is 5.4% and on GPU computing applications is 13.3%. All the applications are written in the CUDA programming language.
机译:由于GPU架构的并行处理器数量众多,因此在多核时代变得越来越重要。对成千上万的大规模并行线程进行编程对软件工程师来说是一个很大的挑战,但是要了解这些并行程序在GPU架构上的性能瓶颈以提高应用程序性能则更加困难。当前的方法依赖于程序员在不完全了解其应用程序性能特征的情况下,通过详尽地利用设计空间来调整其应用程序。为了提供对GPU架构上并行应用程序性能瓶颈的见解,我们提出了一种简单的分析模型来估算执行时间大规模并行程序。我们模型的关键部分是通过考虑正在运行的线程数和内存带宽来估计并行内存请求的数量(我们将其称为内存扭曲并行性)。基于内存扭曲并行度,该模型可以估算内存请求的成本,从而估算程序的总体执行时间。在多个GPU中,模型的结果与实际执行时间之间的比较表明,我们的模型在微基准测试上的绝对误差的几何平均值为5.4%,在GPU计算应用程序上为13.3%。所有应用程序均以CUDA编程语言编写。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号