首页> 外文会议>Symposium on Application Accelerators in High Performance Computing >Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density
【24h】

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

机译:GPU加速器可实现利用计算密度的性能分析

获取原文

摘要

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance. Device performance metrics, such as Computational Density (CD), provide a useful but limited starting point for device comparison. The authors developed the Realizable Utilization (RU) metric and methodology to quantify the discrepancy between theoretical device performance shown by CD and the performance developers can achieve. As the RU score increases, the application is achieving a larger percentage of the computational power the device can provide. The authors survey technical publications about GPUs and use this data to analyze the RU scores for several arithmetic application kernels that are frequently accelerated in GPUs. The RU concepts presented in this paper are a first step towards a formalized comparison framework for diverse devices such as CPUs, FPGAs, GPUs and other novel architectures. GPU kernels for matrix multiplication, matrix decomposition, and N-body simulations show RU scores ranging from almost 0% to approaching 99% depending on the application, but all kernel areas show a significant decrease in RU as the computational capacities increase. Additionally, the RU scores show the higher realized performance of the GeForce 8 Series GPUs versus newer GPU architectures. This paper shows that applications running on GPUs with higher computational density report significantly lower RU scores than more mature GPUs with lower computational density. This trend implies that while the raw performance available is still increasing with newer GPUs, the achieved performance is not keeping pace with the theoretical capacities of the devices.
机译:随着申请加速器数量的增加,开发人员正在寻找快速,公平,在开发周期的早期评估新的和竞争平台的方法。随着高性能计算(HPC)应用程序提高对应用程序加速平台的需求,图形处理单元(GPU)为许多寻求提高性能的开发人员提供了潜在的解决方案。设备性能度量,例如计算密度(CD),为设备比较提供了有用但有限的起点。作者开发了可实现的利用(RU)度量和方法,以量化CD显示的理论设备性能与性能开发人员可以实现的差异。随着ru得分的增加,应用程序正在实现设备可以提供的更大百分比的计算能力。作者调查了关于GPU的技术出版物,并使用此数据分析RU分数,用于频繁地在GPU中加速的若干算术应用程序内核。本文提出的RU概念是迈向CPU,FPGA,GPU等多种设备的正式比较框架的第一步。用于矩阵乘法的GPU内核,矩阵分解和N-body模拟显示Ru评分范围从近于0%到接近99%,取决于应用,但所有内核区域都会在计算能力增加时显着降低。此外,RU分数显示GeForce 8系列GPU的较高实现性能与较新的GPU架构。本文表明,具有较高计算密度的GPU上运行的应用显着降低了具有较低计算密度的成熟GPU的Ru得分。这种趋势意味着,虽然可用的原始性能仍然随着较新的GPU而增加,但实现的性能不会与设备的理论能力保持同步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号