首页> 外文会议>2012 Symposium on Application Accelerators in High Performance Computing. >Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density
【24h】

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

机译:可利用计算密度的GPU加速器性能分析

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance. Device performance metrics, such as Computational Density (CD), provide a useful but limited starting point for device comparison. The authors developed the Realizable Utilization (RU) metric and methodology to quantify the discrepancy between theoretical device performance shown by CD and the performance developers can achieve. As the RU score increases, the application is achieving a larger percentage of the computational power the device can provide. The authors survey technical publications about GPUs and use this data to analyze the RU scores for several arithmetic application kernels that are frequently accelerated in GPUs. The RU concepts presented in this paper are a first step towards a formalized comparison framework for diverse devices such as CPUs, FPGAs, GPUs and other novel architectures. GPU kernels for matrix multiplication, matrix decomposition, and N-body simulations show RU scores ranging from almost 0% to approaching 99% depending on the application, but all kernel areas show a significant decrease in RU as the computational capacities increase. Additionally, the RU scores show the higher realized performance of the GeForce 8 Series GPUs versus newer GPU architectures. This paper shows that applications running on GPUs with higher computational density report significantly lower RU scores than more mature GPUs with lower computational density. This trend implies that while the raw performance available is still increasing with newer GPUs, the achieved performance is not keeping pace with the theoretical capacities of the devices.
机译:随着应用程序加速器数量的增加,开发人员正在寻找方法,以在开发周期的早期,快速和公平地评估新的和竞争的平台。随着高性能计算(HPC)应用程序对应用程序加速平台的需求不断增加,图形处理单元(GPU)为许多寻求提高性能的开发人员提供了潜在的解决方案。诸如计算密度(CD)之类的设备性能指标为设备比较提供了有用但有限的起点。作者开发了可实现使用率(RU)度量和方法,以量化CD所示的理论设备性能与开发人员可以实现的性能之间的差异。随着RU分数的增加,应用程序将获得设备可提供的更大计算能力。作者调查了有关GPU的技术出版物,并使用此数据分析了在GPU中经常加速的几种算术应用内核的RU分数。本文介绍的RU概念是朝着面向各种设备(例如CPU,FPGA,GPU和其他新颖架构)的形式化比较框架的第一步。用于矩阵乘法,矩阵分解和N体模拟的GPU内核显示RU分数从几乎0%到接近99%(取决于应用程序),但是随着计算能力的增加,所有内核区域都显示RU显着下降。此外,RU分数显示了GeForce 8系列GPU与更新的GPU架构相比具有更高的实现性能。本文显示,在具有较高计算密度的GPU上运行的应用程序报告的RU得分明显低于具有较低计算密度的较成熟的GPU。这种趋势意味着,尽管更新的GPU仍可提供原始性能,但获得的性能却与设备的理论容量不符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号