Incorporating selective victim cache into GPGPU for high-performance computing

Jianfei Wang; Fengfeng Fan; Li Jiang; Xiaoyao Liang; Naifeng Jing

首页> 外文期刊>Concurrency, practice and experience >Incorporating selective victim cache into GPGPU for high-performance computing

【24h】

Incorporating selective victim cache into GPGPU for high-performance computing

机译：将选择性受害者高速缓存整合到GPGPU中以实现高性能计算

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Contemporary general-purpose graphic processing units (GPGPUs) successfully parallelize anrnapplication into thousands of concurrent threads with remarkably improved performance. Suchrnmassive threads will compete for the small-sized first-level data (L1D) cache, leading to an exaggeratedrncache-thrashing problem, whichmay degrade the overall performance significantly. In thisrnpaper, we propose a selective victim cache design to enable better data locality and higher performance.rnInstead of a small fully associative structure,we first redesign the victim cache as a setrnassociative structure that is equivalent to the original L1D cache to suit the GPGPU applicationsrnwith massive concurrent threads. To keep themostly used data in L1D for better operand service,rnwe apply a simple prediction scheme to avoid costly block interchanges and evictions. To furtherrnsave the area for data storage, we propose to leverage the unallocated registers and sharedrnmemory entries to hold the victim cache data. The experiments demonstrate that our proposedrnapproach can increase the on-chip data cache hit rate considerably and deliver a better performancernwith negligible changes to the baseline GPGPU architecture. For example, our selectivernvictim cache design can improve the performance by 41.3% on average, achieving 54.7% increasernin data cache hit rate and 21.8% reduction in block interchanges and evictions.

机译：当代的通用图形处理单元（GPGPU）成功地将应用程序并行化为数千个并发线程，并显着提高了性能。这样的大量线程将争夺小型的一级数据（L1D）缓存，从而导致缓存溢出问题加剧，这可能会大大降低整体性能。在本文中，我们提出了一种选择性的牺牲者缓存设计，以实现更好的数据局部性和更高的性能。rn代替小型的完全关联结构，我们首先将牺牲者缓存重新设计为与原始L1D缓存等效的setrn关联结构，以适合GPGPU应用程序。大量并发线程。为了将最常用的数据保留在L1D中以提供更好的操作数服务，我们采用了一种简单的预测方案来避免代价高昂的块交换和逐出。为了进一步节省数据存储区域，我们建议利用未分配的寄存器和共享的内存条目来保存受害者缓存数据。实验表明，我们提出的方法可以显着提高片上数据缓存的命中率，并且通过对基线GPGPU架构进行微不足道的更改，可以提供更好的性能。例如，我们的选择性受害者缓存设计可将性能平均提高41.3％，实现数据缓存命中率增加54.7％，块交换和逐出减少21.8％。

著录项

来源
《Concurrency, practice and experience》 |2017年第24期|e4104.1-e4104.11|共11页
作者
Jianfei Wang; Fengfeng Fan; Li Jiang; Xiaoyao Liang; Naifeng Jing;
展开▼
作者单位

Shanghai Jiao Tong University, Shanghai,200240, China;

Shanghai Jiao Tong University, Shanghai,200240, China;

Shanghai Jiao Tong University, Shanghai,200240, China;

Shanghai Jiao Tong University, Shanghai,200240, China;

Shanghai Jiao Tong University, Shanghai,200240, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPGPU; register file; selective caching; shared memory; victim cache;

机译：GPGPU;注册文件;选择性缓存;共享内存;受害者缓存;

相似文献

外文文献
中文文献
专利

1. Combining Recency of Information with Selective Random and a Victim Cache in Last-Level Caches [J] . ALEJANDRO VALERO, JULIO SAHUQUILLO, SALVADOR PETIT, ACM Transactions on Architecture and Code Optimization . 2012,第3期

机译：将信息的新近度与选择性随机和最终缓存中的受害者缓存相结合
2. Selective victim caching: a method to improve the performance of direct-mapped caches [J] . Stiliadis D., Varma A. IEEE Transactions on Computers . 1997,第5期

机译：选择性受害者缓存：一种提高直接映射缓存性能的方法
3. Improving the Performance and Energy Efficiency of GPGPU Computing through Integrated Adaptive Cache Management [J] . Kim Kyu Yeun, Park Jinsu, Baek Woongki IEEE Transactions on Parallel and Distributed Systems . 2019,第3期

机译：通过集成的自适应缓存管理提高GPGPU计算的性能和能效
4. Applying Victim Cache in High Performance GPGPU Computing [C] . Fengfeng Fan, Jianfei Wang, Li Jiang, IEEE International Symposium on Parallel and Distributed Computing . 2016

机译：在高性能GPGPU计算中应用受害者缓存
5. Performance analysis and acceleration of nuclear physics application on high-performance computing platforms using GPGPUs and topology-aware mapping techniques [D] . Oryspayev, Dossay. 2016

机译：使用GPGPU和拓扑信息映射技术对高性能计算平台核物理应用的性能分析与加速
6. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework [O] . Brian F Sadacca, Joshua L Jones, Geoffrey Schoenbaum 2016

机译：中脑多巴胺神经元在一个通用框架中计算推断和缓存的值预测误差
7. Combining recency of information with selective random and a victim cache in last-level caches [O] . Alejandro Valero, Julio Sahuquillo, Salvador Petit, 2012

机译：将信息与选择性随机和受害者缓存结合在最后一级缓存中
8. Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches [R] . Zhang, M. , Asanovic, K. 2005

机译：受害者迁移：在私有和共享Cmp缓存之间动态调整

Incorporating selective victim cache into GPGPU for high-performance computing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅