首页> 外文OA文献 >Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors

【2h】

Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors

机译：高度多线程图形处理器的高效性能评估

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the emergence of highly multithreaded architectures, an effective performance monitoring system must reflect the interaction between a large number of concurrent events, and associate the overall effect of individual events and inefficiencies to the operations in the application source code. The state-of-the-art performance counters in highly multithreaded graphic processors currently do not provide this level of precision. Although fine-grained sampling of performance counters after each source-level operation could potentially achieve the desired precision, the high frequency of sampling required will likely cause too much distortion to the actual application behavior and make the sampled counter values inaccurate.In this thesis, I present a novel software-based approach for monitoring the memory hierarchy performance in highly multithreaded general-purpose graphics processors. The proposed analysis is based on memory traces collected for small snapshots of application execution. A trace-based memory hierarchy model with a Monte Carlo experimental methodology generates statistical bounds of performance measures in the presence of nonuniform thread interleaving and data sharing in a highly multithreaded execution environment. The statistical approach overcomes the classical problem of disturbed execution timing due to instrumentation. The approach scales well as I deploy a minimal sampling technique to reduce the trace generation overhead and model simulation time.The proposed scheme also keeps track of individual memory operations in the source code and can quantify the amount of their contribution to detrimental effects on memory system performance. A cross-validation of the model results shows close agreement with the values read from the hardware performance counters on an NVIDIA Tesla C2050. I later use the predicted memory hierarchy performance statistics in an analytical model to identify performance characteristics of a kernel and its expected execution time. To account for the systematic error present in the predictions, I approximate theerror function and express a range of potential true execution times for each predicted value.

机译：随着高度多线程的体系结构的出现，有效的性能监视系统必须反映大量并发事件之间的交互作用，并将单个事件的整体效果和效率低下与应用程序源代码中的操作相关联。高度多线程的图形处理器中最新的性能计数器目前无法提供这种精度。尽管在每个源级操作之后对性能计数器进行细粒度采样可能会达到所需的精度，但所需的高采样频率可能会对实际应用行为造成太大的失真，并使采样的计数器值不准确。我提出了一种基于软件的新颖方法，用于监视高度多线程的通用图形处理器中的内存层次结构性能。所提出的分析基于为应用程序执行的小快照收集的内存跟踪。在高度多线程执行环境中，在存在非均匀线程交织和数据共享的情况下，采用蒙特卡洛实验方法的基于跟踪的内存层次模型会生成性能度量的统计范围。统计方法克服了传统的因仪器执行时间受干扰的问题。当我部署最小采样技术以减少跟踪生成开销和模型仿真时间时，该方法可以很好地进行扩展。所提出的方案还可以跟踪源代码中的各个内存操作，并可以量化它们对内存系统的有害影响的数量性能。对模型结果的交叉验证表明，这些结果与从NVIDIA Tesla C2050的硬件性能计数器读取的值非常一致。稍后，我在分析模型中使用预测的内存层次结构性能统计信息来确定内核的性能特征及其预期的执行时间。为了解决预测中存在的系统误差，我对误差函数进行了近似，并为每个预测值表示了潜在的真实执行时间范围。

著录项

作者
Sadeghi Baghsorkhi Sara;
展开▼
作者单位

展开▼
年度 2011
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors [J] . Sara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第8期

机译：高度多线程图形处理器的内存层次结构的有效性能评估
2. An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing [J] . Jung-Wook Park, Hoon-Mo Yang, Gi-Ho Park, Journal of Parallel and Distributed Computing . 2010,第11期

机译：用于多线程3D图形处理的指令收缩式可编程着色器体系结构
3. Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches [J] . Jarno Mielikainen, Bormin Huang, Hung-Lung Allen Huang, Journal of Applied Remote Sensing . 2011,第Null期

机译：加速图形处理单元上的RTTOV-7 IASI和AMSU-A辐射传递模型：评估中央处理单元/图形处理单元-混合和纯图形处理单元方法
4. Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors [C] . Sara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, ACM SIGPLAN symposium on principles and practice of parallel programming . 2012

机译：高度多线程图形处理器的内存层次结构的有效性能评估
5. Perfomance evaluation of multi-threaded system vs. chip-multi processor system. [D] . Kim, Hoyoung. 2012

机译：多线程系统与芯片多处理器系统的性能评估。
6. On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters [O] . David B. Williams-Young, Wibe A. de Jong, Hubertus J. J. van Dam, 2020

机译：论图形处理单元集群的交换相关电位的高效评估
7. Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors [O] . Sara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, 2012

机译：高度多线程图形处理器的内存层次结构的有效性能评估

Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅