首页> 外文学位 >Improving the Performance and Time-Predictability of GPUs
【24h】

Improving the Performance and Time-Predictability of GPUs

机译:改善GPU的性能和时间可预测性

获取原文
获取原文并翻译 | 示例

摘要

Graphic Processing Units (GPUs) are originally mainly designed to accelerate graphic applications. Now the capability of GPUs to accelerate applications that can be parallelized into a massive number of threads makes GPUs the ideal accelerator for boosting the performance of such kind of general-purpose applications. Meanwhile it is also very promising to apply GPUs to embedded and real-time applications as well, where high throughput and intensive computation are also needed.;However, due to the different architecture and programming model of GPUs, how to fully utilize the advanced architectural features of GPUs to boost the performance and how to analyze the worst-case execution time (WCET) of GPU applications are the problems that need to be addressed before exploiting GPUs further in embedded and real-time applications. We propose to apply both architectural modification and static analysis methods to address these problems. First, we propose to study the GPU cache behavior and use bypassing to reduce unnecessary memory traffic and to improve the performance. The results show that the proposed bypassing method can reduce the global memory traffic by about 22% and improve the performance by about 13% on average. Second, we propose a cache access reordering framework based on both architectural extension and static analysis to improve the predictability of GPU L1 data caches. The evaluation results show that the proposed method can provide good predictability in GPU L1 data caches, while allowing the dynamic warp scheduling for good performance. Third, based on the analysis of the architecture and dynamic behavior of GPUs, we propose a WCET timing model based on a predictable warp scheduling policy to enable the WCET estimation on GPUs. The experimental results show that the proposed WCET analyzer can effectively provide WCET estimations for both soft and hard real-time application purposes. Last, we propose to analyze the shared Last Level Cache (LLC) in integrated CPU-GPU architectures and to integrate the analysis of the shared LLC into the WCET analysis of the GPU kernels in such systems. The results show that the proposed shared data LLC analysis method can improve the accuracy of the shared LLC miss rate estimations, which can further improve the WCET estimations of the GPU kernels.
机译:图形处理单元(GPU)最初主要设计用于加速图形应用程序。现在,GPU能够加速可并行化为大量线程的应用程序的能力使GPU成为提高此类通用应用程序性能的理想加速器。同时,将GPU应用于需要高吞吐量和密集计算的嵌入式和实时应用中也很有希望;但是,由于GPU的架构和编程模型不同,如何充分利用高级架构GPU的功能以提高性能以及如何分析GPU应用程序的最坏情况执行时间(WCET)是在嵌入式和实时应用程序中进一步利用GPU之前需要解决的问题。我们建议同时应用体系结构修改和静态分析方法来解决这些问题。首先,我们建议研究GPU缓存行为并使用旁路来减少不必要的内存流量并提高性能。结果表明,所提出的旁路方法可以将全局内存流量减少约22%,并将性能平均提高约13%。其次,我们提出了一种基于架构扩展和静态分析的缓存访问重排序框架,以提高GPU L1数据缓存的可预测性。评估结果表明,该方法可以在GPU L1数据缓存中提供良好的可预测性,同时允许动态扭曲调度以实现良好的性能。第三,在对GPU的体系结构和动态行为进行分析的基础上,我们提出了一种基于可预测的翘曲调度策略的WCET时序模型,以实现对GPU的WCET估计。实验结果表明,所提出的WCET分析器可以有效地提供WCET估计,以用于软实时和硬实时应用。最后,我们建议在集成的CPU-GPU架构中分析共享的末级缓存(LLC),并将共享LLC的分析集成到此类系统中GPU内核的WCET分析中。结果表明,所提出的共享数据LLC分析方法可以提高共享LLC丢失率估计的准确性,从而可以进一步提高GPU内核的WCET估计。

著录项

  • 作者

    Huangfu, Yijie.;

  • 作者单位

    Virginia Commonwealth University.;

  • 授予单位 Virginia Commonwealth University.;
  • 学科 Computer engineering.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 106 p.
  • 总页数 106
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号