Warp-Based Load/Store Reordering to Improve GPU Data Cache Time Predictability and Performance

机译：基于扭曲的加载/存储重新排序，以改善GPU数据缓存的时间可预测性和性能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Units (GPUs) have great potential to improve the performance and energy efficiency for data-parallel real-time applications. However, it is very difficult to compute worst-case execution time (WCET) for current GPUs that are design for improving the average-case throughput, not for time predictability. In this paper, we propose a warp-based load/store reordering mechanism to improve the time predictability of GPU data caching without incurring much performance overhead. This mechanism can be used in conjunction with dynamic warp scheduling to achieve better performance than a pure round-robin based scheduling while enabling accurate static timing analysis to bound the worst-case GPU L1 data cache misses.

机译：图形处理单元（GPU）在提高数据并行实时应用程序的性能和能效方面具有巨大潜力。但是，对于当前用于设计平均GPU吞吐量而不是时间可预测性的GPU，很难计算最坏情况执行时间（WCET）。在本文中，我们提出了一种基于扭曲的加载/存储重排序机制，以提高GPU数据缓存的时间可预测性，而不会产生太多性能开销。与基于纯循环调度的调度相比，该机制可与动态扭曲调度结合使用，以实现更好的性能，同时能够进行精确的静态时序分析以限制最坏情况下的GPU L1数据高速缓存未命中。

著录项

来源
《IEEE International Symposium on Real-Time Distributed Computing》|2016年|166-173|共8页
会议地点
作者
Yijie Huangfu; Wei Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Kernel; Dynamic scheduling; Real-time systems; Timing; Out of order;

机译：图形处理单元;内核;动态调度;实时系统;定时;故障;

相似文献

外文文献
中文文献
专利

1. APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs [J] . Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Computer architecture news . 2016,第3期

机译：APRES：通过利用GPU的负载特性来提高缓存效率
2. Predictability and Performance Aware Replacement Policy PVISAM for Unified Shared Caches in Real-time Multicores [J] . Mohammad Shihabul Haque, Arvind Easwaran IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2018,第11期

机译：实时多核中统一共享缓存的可预测性和性能感知替换策略PVISAM
3. Improving Data Cache Performance with Integrated Use of Split Caches, Victim Cache and Stream Buffers [J] . Afrin Naz, Mehran Rezaei, Krishna Kavi, Computer architecture news . 2005,第3期

机译：通过结合使用拆分缓存，受害者缓存和流缓冲区来提高数据缓存性能
4. Warp-Based Load/Store Reordering to Improve GPU Data Cache Time Predictability and Performance [C] . Yijie Huangfu, Wei Zhang IEEE International Symposium on Real-Time Distributed Computing . 2016

机译：基于扭曲的负载/商店重新排序以提高GPU数据缓存时间可预测性和性能
5. Improving the Performance and Time-Predictability of GPUs [D] . Huangfu, Yijie. 2017

机译：改善GPU的性能和时间可预测性
6. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs [O] . Liang-Tsung Huang, Chao-Chin Wu, Lien-Fu Lai, -1

机译：改进Smith-Waterman序列数据库搜索到支持CUDA的GPU的映射
7. Load-Store Reordering for Low-Power Multimedia Data Transfers [O] . Woongki Baek, Jihong Kim 2012

机译：用于低功耗多媒体数据传输的加载存储重新排序

Warp-Based Load/Store Reordering to Improve GPU Data Cache Time Predictability and Performance

摘要

著录项

相似文献

相关主题

期刊订阅