A Cache Hierarchy Aware Thread Mapping Methodology for GPGPUs

Lai Bo-Cheng Charles; Kuo Hsien-Kai; Jou Jing-Yang

首页> 外文期刊>Computers, IEEE Transactions on >A Cache Hierarchy Aware Thread Mapping Methodology for GPGPUs

【24h】

A Cache Hierarchy Aware Thread Mapping Methodology for GPGPUs

机译：GPGPU的缓存层次结构感知线程映射方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The recently proposed GPGPU architecture has added a multi-level hierarchy of shared cache to better exploit the data locality of general purpose applications. The GPGPU design philosophy allocates most of the chip area to processing cores, and thus results in a relatively small cache shared by a large number of cores when compared with conventional multi-core CPUs. Applying a proper thread mapping scheme is crucial for gaining from constructive cache sharing and avoiding resource contention among thousands of threads. However, due to the significant differences on architectures and programming models, the existing thread mapping approaches for multi-core CPUs do not perform as effective on GPGPUs. This paper proposes a formal model to capture both the characteristics of threads as well as the cache sharing behavior of multi-level shared cache. With appropriate proofs, the model forms a solid theoretical foundation beneath the proposed cache hierarchy aware thread mapping methodology for multi-level shared cache GPGPUs. The experiments reveal that the three-staged thread mapping methodology can successfully improve the data reuse on each cache level of GPGPUs and achieve an average of 2.3× to 4.3× runtime enhancement when compared with existing approaches.

机译：最近提出的GPGPU架构增加了共享缓存的多层结构，以更好地利用通用应用程序的数据局部性。 GPGPU设计理念将大部分芯片区域分配给处理内核，因此与传统的多核CPU相比，导致大量内核共享相对较小的缓存。应用适当的线程映射方案对于从构造性缓存共享中获得收益以及避免数千个线程之间的资源争用至关重要。但是，由于架构和编程模型上的重大差异，现有的多核CPU线程映射方法在GPGPU上的效果不佳。本文提出了一个正式的模型来捕获线程的特征以及多级共享缓存的缓存共享行为。有了适当的证明，该模型为多层共享缓存GPGPU的建议的缓存层次结构感知线程映射方法奠定了坚实的理论基础。实验表明，与现有方法相比，三阶段线程映射方法可以成功地改善GPGPU的每个缓存级别上的数据重用性，并实现平均2.3倍至4.3倍的运行时间增强。

著录项

来源
《Computers, IEEE Transactions on》 |2015年第4期|884-898|共15页
作者
Lai Bo-Cheng Charles; Kuo Hsien-Kai; Jou Jing-Yang;
展开▼
作者单位

Department of Electronics Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan, ROC;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Arrays; Graphics processing units; Instruction sets; Kernel; Message systems; Optimization; Multithreaded processors; cache memories; performance analysis and design aids; shared memory;

机译：数组;图形处理单元;指令集;内核;消息系统;优化;多线程处理器;高速缓冲存储器;性能分析和设计辅助工具;共享内存;

相似文献

外文文献
中文文献
专利

1. A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts [J] . Masayuki SATO, Ryusuke EGAWA, Hiroyuki TAKIZAWA, IEICE transactions on information and systems . 2013,第9期

机译：结合缓存分区的容量感知线程调度方法，以减少线程间缓存冲突
2. A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts [J] . Masayuki SATQ, Ryusuke EGAWA, Hiroyuki TAKIZAWA, IEICE Transactions on Information and Systems . 2013,第9期

机译：结合缓存分区的容量感知线程调度方法，以减少线程间缓存冲突
3. NestedMP: Enabling cache-aware thread mapping for nested parallel shared memory applications [J] . He Jiangzhou, Chen Wenguang, Tang Zhizhong Parallel Computing . 2016,第Jana期

机译：NestedMP：为嵌套的并行共享内存应用程序启用缓存感知线程映射
4. Thread affinity mapping for irregular data access on shared Cache GPGPU [C] . Hsien-Kai Kuo, Kuan-Ting Chen, Lai Bo-Cheng Charles, 2012 17th Asia and South Pacific Design Automation Conference . 2012

机译：线程相似性映射用于共享缓存GPGPU上的不规则数据访问
5. Performance analysis and acceleration of nuclear physics application on high-performance computing platforms using GPGPUs and topology-aware mapping techniques [D] . Oryspayev, Dossay. 2016

机译：使用GPGPU和拓扑信息映射技术对高性能计算平台核物理应用的性能分析与加速
6. Developing a hierarchical decomposition methodology to increase manufacturing process and equipment health awareness [O] . Brian A. Weiss, Michael Sharp, Alexander Klinger -1

机译：开发分级分解方法以提高制造过程和设备健康意识
7. An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches [O] . Alamelu Sankaranarayanan, Ehsan K. Ardestani, Jose Luis Briz, 2013

机译：具有微小的非相干高速缓存的节能GpGpU存储器层次结构

A Cache Hierarchy Aware Thread Mapping Methodology for GPGPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅