Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

机译：GPU加速器可实现利用计算密度的性能分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance. Device performance metrics, such as Computational Density (CD), provide a useful but limited starting point for device comparison. The authors developed the Realizable Utilization (RU) metric and methodology to quantify the discrepancy between theoretical device performance shown by CD and the performance developers can achieve. As the RU score increases, the application is achieving a larger percentage of the computational power the device can provide. The authors survey technical publications about GPUs and use this data to analyze the RU scores for several arithmetic application kernels that are frequently accelerated in GPUs. The RU concepts presented in this paper are a first step towards a formalized comparison framework for diverse devices such as CPUs, FPGAs, GPUs and other novel architectures. GPU kernels for matrix multiplication, matrix decomposition, and N-body simulations show RU scores ranging from almost 0% to approaching 99% depending on the application, but all kernel areas show a significant decrease in RU as the computational capacities increase. Additionally, the RU scores show the higher realized performance of the GeForce 8 Series GPUs versus newer GPU architectures. This paper shows that applications running on GPUs with higher computational density report significantly lower RU scores than more mature GPUs with lower computational density. This trend implies that while the raw performance available is still increasing with newer GPUs, the achieved performance is not keeping pace with the theoretical capacities of the devices.

机译：随着申请加速器数量的增加，开发人员正在寻找快速，公平，在开发周期的早期评估新的和竞争平台的方法。随着高性能计算（HPC）应用程序提高对应用程序加速平台的需求，图形处理单元（GPU）为许多寻求提高性能的开发人员提供了潜在的解决方案。设备性能度量，例如计算密度（CD），为设备比较提供了有用但有限的起点。作者开发了可实现的利用（RU）度量和方法，以量化CD显示的理论设备性能与性能开发人员可以实现的差异。随着ru得分的增加，应用程序正在实现设备可以提供的更大百分比的计算能力。作者调查了关于GPU的技术出版物，并使用此数据分析RU分数，用于频繁地在GPU中加速的若干算术应用程序内核。本文提出的RU概念是迈向CPU，FPGA，GPU等多种设备的正式比较框架的第一步。用于矩阵乘法的GPU内核，矩阵分解和N-body模拟显示Ru评分范围从近于0％到接近99％，取决于应用，但所有内核区域都会在计算能力增加时显着降低。此外，RU分数显示GeForce 8系列GPU的较高实现性能与较新的GPU架构。本文表明，具有较高计算密度的GPU上运行的应用显着降低了具有较低计算密度的成熟GPU的Ru得分。这种趋势意味着，虽然可用的原始性能仍然随着较新的GPU而增加，但实现的性能不会与设备的理论能力保持同步。

著录项

来源
《Symposium on Application Accelerators in High Performance Computing》|2012年||共4页
会议地点
作者
Richardson Justin W.; George Alan D.; Lam Herman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Analysis of Fixed, Reconfigurable, and Hybrid Devices with Computational, Memory, I/O, & Realizable-Utilization Metrics [J] . JUSTIN RICHARDSON, ALAN GEORGE, KEVIN CHENG, ACM transactions on reconfigurable technology and systems . 2017,第1期

机译：具有计算，内存，I / O和可实现利用率指标的固定，可重新配置和混合设备分析
2. Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling [J] . Park Seung In, Cao Yong, Watson Layne T., Journal of Real-Time Image Processing . 2015,第3期

机译：用于鲁棒面图像建模的新型GPU计算核心映射方案的性能分析
3. Research Activity in Computational Physics utilizing High Performance Computing: Co-authorship Network Analysis [J] . Sul-Ah Ahn, Youngim Jung Journal of Physics: Conference Series . 2016,第1期

机译：利用高性能计算的计算物理研究活动：共同作者网络分析
4. Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density [C] . Richardson Justin W., George Alan D., Lam Herman 2012 Symposium on Application Accelerators in High Performance Computing. . 2012

机译：可利用计算密度的GPU加速器性能分析
5. Performance analysis of reconfigurable logic accelerators in heterogenous multi-core architectures. [D] . LaBroad, Jonathan. 2011

机译：异构多核体系结构中可重新配置逻辑加速器的性能分析。
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU [O] . Hongjun Choi, Dongoh Son, Jongmyon Kim, 2015

机译：改善GPU性能的计算资源的主动/非活动状态分析

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

摘要

著录项

相似文献

相关主题

期刊订阅