Improving the Performance and Time-Predictability of GPUs

机译：改善GPU的性能和时间可预测性

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphic Processing Units (GPUs) are originally mainly designed to accelerate graphic applications. Now the capability of GPUs to accelerate applications that can be parallelized into a massive number of threads makes GPUs the ideal accelerator for boosting the performance of such kind of general-purpose applications. Meanwhile it is also very promising to apply GPUs to embedded and real-time applications as well, where high throughput and intensive computation are also needed.;However, due to the different architecture and programming model of GPUs, how to fully utilize the advanced architectural features of GPUs to boost the performance and how to analyze the worst-case execution time (WCET) of GPU applications are the problems that need to be addressed before exploiting GPUs further in embedded and real-time applications. We propose to apply both architectural modification and static analysis methods to address these problems. First, we propose to study the GPU cache behavior and use bypassing to reduce unnecessary memory traffic and to improve the performance. The results show that the proposed bypassing method can reduce the global memory traffic by about 22% and improve the performance by about 13% on average. Second, we propose a cache access reordering framework based on both architectural extension and static analysis to improve the predictability of GPU L1 data caches. The evaluation results show that the proposed method can provide good predictability in GPU L1 data caches, while allowing the dynamic warp scheduling for good performance. Third, based on the analysis of the architecture and dynamic behavior of GPUs, we propose a WCET timing model based on a predictable warp scheduling policy to enable the WCET estimation on GPUs. The experimental results show that the proposed WCET analyzer can effectively provide WCET estimations for both soft and hard real-time application purposes. Last, we propose to analyze the shared Last Level Cache (LLC) in integrated CPU-GPU architectures and to integrate the analysis of the shared LLC into the WCET analysis of the GPU kernels in such systems. The results show that the proposed shared data LLC analysis method can improve the accuracy of the shared LLC miss rate estimations, which can further improve the WCET estimations of the GPU kernels.

机译：图形处理单元（GPU）最初主要设计用于加速图形应用程序。现在，GPU能够加速可并行化为大量线程的应用程序的能力使GPU成为提高此类通用应用程序性能的理想加速器。同时，将GPU应用于需要高吞吐量和密集计算的嵌入式和实时应用中也很有希望;但是，由于GPU的架构和编程模型不同，如何充分利用高级架构GPU的功能以提高性能以及如何分析GPU应用程序的最坏情况执行时间（WCET）是在嵌入式和实时应用程序中进一步利用GPU之前需要解决的问题。我们建议同时应用体系结构修改和静态分析方法来解决这些问题。首先，我们建议研究GPU缓存行为并使用旁路来减少不必要的内存流量并提高性能。结果表明，所提出的旁路方法可以将全局内存流量减少约22％，并将性能平均提高约13％。其次，我们提出了一种基于架构扩展和静态分析的缓存访问重排序框架，以提高GPU L1数据缓存的可预测性。评估结果表明，该方法可以在GPU L1数据缓存中提供良好的可预测性，同时允许动态扭曲调度以实现良好的性能。第三，在对GPU的体系结构和动态行为进行分析的基础上，我们提出了一种基于可预测的翘曲调度策略的WCET时序模型，以实现对GPU的WCET估计。实验结果表明，所提出的WCET分析器可以有效地提供WCET估计，以用于软实时和硬实时应用。最后，我们建议在集成的CPU-GPU架构中分析共享的末级缓存（LLC），并将共享LLC的分析集成到此类系统中GPU内核的WCET分析中。结果表明，所提出的共享数据LLC分析方法可以提高共享LLC丢失率估计的准确性，从而可以进一步提高GPU内核的WCET估计。

著录项

作者
Huangfu, Yijie.;
展开▼
作者单位

Virginia Commonwealth University.;

展开▼
授予单位 Virginia Commonwealth University.;
学科 Computer engineering.
学位 Ph.D.
年度 2017
页码 106 p.
总页数 106
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Avoiding Duplicated Computation to Improve The Performance of PFSP on CUDA GPUs [J] . Chao-Chin Wu, Kai-Cheng Wei, Wei-Shen Lai, Computer Science & Information Technology . 2016,第7期

机译：避免重复计算，以提高PFSP对CUDA GPU的性能
2. Using GPUs to improve multigrid solver performance on a cluster [J] . Dominik Goddeke, Robert Strzodka, Jamaludin Mohd-Yusof, International Journal of Computational Science and Engineering . 2008,第1期

机译：使用GPU改善集群上的多网格求解器性能
3. Significantly improved performances of the cryptographically generated addresses thanks to ECC and GPGPU [J] . Tony Cheneau, Aymen Boudguiga, Maryline Laurent Computers & Security . 2010,第4期

机译：借助ECC和GPGPU，显着提高了密码生成地址的性能
4. Cache Side-Channel Attacks and Time-Predictability in High-Performance Critical Real-Time Systems [C] . David Trilla, Carles Hernandez, Jaume Abella, 2018 55th ACM/ESDA/IEEE Design Automation Conference . 2018

机译：高性能关键实时系统中的缓存侧通道攻击和时间可预测性
5. Improving simulation performance with GPUs [D] . Shi, Jian 2011

机译：使用GPU改善仿真性能
6. AMIDE v2: High-Throughput Screening Based on AutoDock-GPU and Improved Workflow Leading to Better Performance and Reliability [O] . Pierre Darme, Manuel Dauchez, Arnaud Renard, 2021

机译：amide v2：基于Autodock-GPU的高吞吐量筛选和改进的工作流程导致更好的性能和可靠性
7. On improving performance of sparse matrix-matrix multiplication on GPUs [O] . Rakshith Kunchum, Ankur Chaudhry, Aravind Sukumaran-Rajam, 2017

机译：提高GPU稀疏矩阵矩阵乘法的性能

Improving the Performance and Time-Predictability of GPUs

摘要

著录项

相似文献

相关主题

期刊订阅