Orchestrating Cache Management and Memory Scheduling for GPGPU Applications

Mu S.; Deng Y.; Chen Y.; Li H.; Pan J.; Zhang W.; Wang Z.

首页> 外文期刊>Very Large Scale Integration (VLSI) Systems, IEEE Transactions on >Orchestrating Cache Management and Memory Scheduling for GPGPU Applications

【24h】

Orchestrating Cache Management and Memory Scheduling for GPGPU Applications

机译：为GPGPU应用程序协调缓存管理和内存调度

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Modern graphics processing units (GPUs) are delivering tremendous computing horsepower by running tens of thousands of threads concurrently. The massively parallel execution model has been effective to hide the long latency of off-chip memory accesses in graphics and other general computing applications exhibiting regular memory behaviors. With the fast-growing demand for general purpose computing on GPUs (GPGPU), GPU workloads are becoming highly diversified, and thus requiring a synergistic coordination of both computing and memory resources to unleash the computing power of GPUs. Accordingly, recent graphics processors begin to integrate an on-die level-2 (L2) cache. The huge number of threads on GPUs, however, poses significant challenges to L2 cache design. The experiments on a variety of GPGPU applications reveal that the L2 cache may or may not improve the overall performance depending on the characteristics of applications. In this paper, we propose efficient techniques to improve GPGPU performance by orchestrating both L2 cache and memory in a unified framework. The basic philosophy is to exploit the temporal locality among the massive number of concurrent memory requests and minimize the impact of memory divergence behaviors among simultaneously executed groups of threads. Our major contributions are twofold. First, a priority-based cache management is proposed to maximize the chance of frequently revisited data to be kept in the cache. Second, an effective memory scheduling is introduced to reorder memory requests in the memory controller according to the divergence behavior for reducing average waiting time of warps. Simulation results reveal that our techniques enhance the overall performance by 10% on average for memory intensive benchmarks, whereas the maximum gain can be up to 30%.

机译：现代图形处理单元（GPU）通过同时运行数以万计的线程来提供巨大的计算能力。大规模并行执行模型已经有效地隐藏了图形和其他显示常规内存行为的通用计算应用程序中片外内存访问的长时间延迟。随着对GPU上通用计算（GPGPU）的快速增长的需求，GPU工作负载正变得高度多样化，因此需要对计算和内存资源进行协同协调以释放GPU的计算能力。因此，最近的图形处理器开始集成片上2级（L2）缓存。但是，GPU上的大量线程对L2缓存设计提出了重大挑战。在各种GPGPU应用程序上进行的实验表明，取决于应用程序的特性，L2缓存可能会或可能不会改善整体性能。在本文中，我们提出了通过在统一框架中协调L2缓存和内存来提高GPGPU性能的有效技术。基本原理是利用大量并发内存请求之间的时间局部性，并最小化同时执行的线程组之间的内存发散行为的影响。我们的主要贡献是双重的。首先，提出了基于优先级的缓存管理，以最大程度地提高频繁访问的数据保留在缓存中的机会。其次，引入有效的存储器调度以根据发散行为对存储器控制器中的存储器请求进行重新排序，以减少扭曲的平均等待时间。仿真结果表明，对于内存密集型基准测试，我们的技术可使整体性能平均提高10％，而最大增益可以达到30％。

著录项

来源
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》 |2014年第8期|1803-1814|共12页
作者
Mu S.; Deng Y.; Chen Y.; Li H.; Pan J.; Zhang W.; Wang Z.;
展开▼
作者单位

, Institute of Microelectronics, Circuit and System, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Benchmark testing; Graphics processing units; Instruction sets; Memory management; Processor scheduling; Random access memory; Cache management; general purpose computing on graphics processing units (GPGPU); memory latency divergence; memory scheduling; priority; warp; warp.;

机译：基准测试;图形处理单元;指令集;内存管理;处理器调度;随机存取存储器;缓存管理;图形处理单元（GPGPU）上的通用计算;内存延迟差异;内存调度;优先;经;经。;

相似文献

外文文献
中文文献
专利

1. Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory [J] . Wang Guan, Zang Chuanqi, Ju Lei, ACM Transactions on Embedded Computing Systems . 2018,第4期

机译：具有混合主存储器的GPGPU的共享最后一级缓存管理和内存调度
2. Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance [J] . Candel Francisco, Valero Alejandro, Petit Salvador, IEEE Transactions on Computers . 2019,第10期

机译：高效的缓存访问管理，以提高GPGPU内存子系统的性能
3. Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance [J] . Candel Francisco, Valero Alejandro, Petit Salvador, IEEE Transactions on Computers . 2019,第10期

机译：高效管理缓存访问以提升GPGPU内存子系统性能
4. Cache Capacity Aware Thread Scheduling for Irregular Memory Access on many-core GPGPUs [C] . Hsien-Kai Kuo, Ta-Kan Yen, Lai Bo-Cheng Charles, 2013 18th Asia and South Pacific Design Automation Conference . 2013

机译：用于多核GPGPU上不规则内存访问的缓存容量感知线程调度
5. Modeling interprocess shared-cache contention on multicore architectures with applications in virtual machine CPU scheduling. [D] . Emeneker, Wesley. 2011

机译：使用虚拟机CPU调度中的应用程序对多核体系结构上的进程间共享缓存争用进行建模。
6. Fog Computing: Enabling the Management and Orchestration of Smart City Applications in 5G Networks [O] . José Santos, Tim Wauters, Bruno Volckaert, 2018

机译：雾计算：在5G网络中启用智能城市应用的管理和编排
7. An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches [O] . Alamelu Sankaranarayanan, Ehsan K. Ardestani, Jose Luis Briz, 2013

机译：具有微小的非相干高速缓存的节能GpGpU存储器层次结构

Orchestrating Cache Management and Memory Scheduling for GPGPU Applications

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅