Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

机译：可利用计算密度的GPU加速器性能分析

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance. Device performance metrics, such as Computational Density (CD), provide a useful but limited starting point for device comparison. The authors developed the Realizable Utilization (RU) metric and methodology to quantify the discrepancy between theoretical device performance shown by CD and the performance developers can achieve. As the RU score increases, the application is achieving a larger percentage of the computational power the device can provide. The authors survey technical publications about GPUs and use this data to analyze the RU scores for several arithmetic application kernels that are frequently accelerated in GPUs. The RU concepts presented in this paper are a first step towards a formalized comparison framework for diverse devices such as CPUs, FPGAs, GPUs and other novel architectures. GPU kernels for matrix multiplication, matrix decomposition, and N-body simulations show RU scores ranging from almost 0% to approaching 99% depending on the application, but all kernel areas show a significant decrease in RU as the computational capacities increase. Additionally, the RU scores show the higher realized performance of the GeForce 8 Series GPUs versus newer GPU architectures. This paper shows that applications running on GPUs with higher computational density report significantly lower RU scores than more mature GPUs with lower computational density. This trend implies that while the raw performance available is still increasing with newer GPUs, the achieved performance is not keeping pace with the theoretical capacities of the devices.

机译：随着应用程序加速器数量的增加，开发人员正在寻找方法，以在开发周期的早期，快速和公平地评估新的和竞争的平台。随着高性能计算（HPC）应用程序对应用程序加速平台的需求不断增加，图形处理单元（GPU）为许多寻求提高性能的开发人员提供了潜在的解决方案。诸如计算密度（CD）之类的设备性能指标为设备比较提供了有用但有限的起点。作者开发了可实现使用率（RU）度量和方法，以量化CD所示的理论设备性能与开发人员可以实现的性能之间的差异。随着RU分数的增加，应用程序将获得设备可提供的更大计算能力。作者调查了有关GPU的技术出版物，并使用此数据分析了在GPU中经常加速的几种算术应用内核的RU分数。本文介绍的RU概念是朝着面向各种设备（例如CPU，FPGA，GPU和其他新颖架构）的形式化比较框架的第一步。用于矩阵乘法，矩阵分解和N体模拟的GPU内核显示RU分数从几乎0％到接近99％（取决于应用程序），但是随着计算能力的增加，所有内核区域都显示RU显着下降。此外，RU分数显示了GeForce 8系列GPU与更新的GPU架构相比具有更高的实现性能。本文显示，在具有较高计算密度的GPU上运行的应用程序报告的RU得分明显低于具有较低计算密度的较成熟的GPU。这种趋势意味着，尽管更新的GPU仍可提供原始性能，但获得的性能却与设备的理论容量不符。

著录项

来源
《2012 Symposium on Application Accelerators in High Performance Computing.》|2012年|p.137- 140|共4页
会议地点 Argonne IL(US)
作者
Richardson Justin W.; George Alan D.; Lam Herman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Analysis of Fixed, Reconfigurable, and Hybrid Devices with Computational, Memory, I/O, & Realizable-Utilization Metrics [J] . JUSTIN RICHARDSON, ALAN GEORGE, KEVIN CHENG, ACM transactions on reconfigurable technology and systems . 2017,第1期

机译：具有计算，内存，I / O和可实现利用率指标的固定，可重新配置和混合设备分析
2. Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling [J] . Park Seung In, Cao Yong, Watson Layne T., Journal of Real-Time Image Processing . 2015,第3期

机译：用于鲁棒面图像建模的新型GPU计算核心映射方案的性能分析
3. Research Activity in Computational Physics utilizing High Performance Computing: Co-authorship Network Analysis [J] . Sul-Ah Ahn, Youngim Jung Journal of Physics: Conference Series . 2016,第1期

机译：利用高性能计算的计算物理研究活动：共同作者网络分析
4. Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density [C] . Richardson Justin W., George Alan D., Lam Herman Symposium on Application Accelerators in High Performance Computing . 2012

机译：GPU加速器可实现利用计算密度的性能分析
5. Performance analysis of reconfigurable logic accelerators in heterogenous multi-core architectures. [D] . LaBroad, Jonathan. 2011

机译：异构多核体系结构中可重新配置逻辑加速器的性能分析。
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU [O] . Hongjun Choi, Dongoh Son, Jongmyon Kim, 2015

机译：改善GPU性能的计算资源的主动/非活动状态分析

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅