首页> 外文期刊>Concurrency, practice and experience >Incorporating selective victim cache into GPGPU for high-performance computing
【24h】

Incorporating selective victim cache into GPGPU for high-performance computing

机译:将选择性受害者高速缓存整合到GPGPU中以实现高性能计算

获取原文
获取原文并翻译 | 示例

摘要

Contemporary general-purpose graphic processing units (GPGPUs) successfully parallelize anrnapplication into thousands of concurrent threads with remarkably improved performance. Suchrnmassive threads will compete for the small-sized first-level data (L1D) cache, leading to an exaggeratedrncache-thrashing problem, whichmay degrade the overall performance significantly. In thisrnpaper, we propose a selective victim cache design to enable better data locality and higher performance.rnInstead of a small fully associative structure,we first redesign the victim cache as a setrnassociative structure that is equivalent to the original L1D cache to suit the GPGPU applicationsrnwith massive concurrent threads. To keep themostly used data in L1D for better operand service,rnwe apply a simple prediction scheme to avoid costly block interchanges and evictions. To furtherrnsave the area for data storage, we propose to leverage the unallocated registers and sharedrnmemory entries to hold the victim cache data. The experiments demonstrate that our proposedrnapproach can increase the on-chip data cache hit rate considerably and deliver a better performancernwith negligible changes to the baseline GPGPU architecture. For example, our selectivernvictim cache design can improve the performance by 41.3% on average, achieving 54.7% increasernin data cache hit rate and 21.8% reduction in block interchanges and evictions.
机译:当代的通用图形处理单元(GPGPU)成功地将应用程序并行化为数千个并发线程,并显着提高了性能。这样的大量线程将争夺小型的一级数据(L1D)缓存,从而导致缓存溢出问题加剧,这可能会大大降低整体性能。在本文中,我们提出了一种选择性的牺牲者缓存设计,以实现更好的数据局部性和更高的性能。rn代替小型的完全关联结构,我们首先将牺牲者缓存重新设计为与原始L1D缓存等效的setrn关联结构,以适合GPGPU应用程序。大量并发线程。为了将最常用的数据保留在L1D中以提供更好的操作数服务,我们采用了一种简单的预测方案来避免代价高昂的块交换和逐出。为了进一步节省数据存储区域,我们建议利用未分配的寄存器和共享的内存条目来保存受害者缓存数据。实验表明,我们提出的方法可以显着提高片上数据缓存的命中率,并且通过对基线GPGPU架构进行微不足道的更改,可以提供更好的性能。例如,我们的选择性受害者缓存设计可将性能平均提高41.3%,实现数据缓存命中率增加54.7%,块交换和逐出减少21.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号