首页> 外文会议>IEEE International Symposium on Parallel and Distributed Computing >Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture
【24h】

Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture

机译:分析CPU-GPGPU共享LLC架构上的内存访问

获取原文

摘要

The data exchange between GPGPUs and CPUs are becoming more and more important nowadays. One trend in industry to alleviate the long latency is to integrate CPUs and GPGPUs on a single chip. In this paper, we analyze the reference interactions between CPU and GPGPU applications with a CPU-GPGPU co-simulator that integrates the gem5 and gpgpu-sim together. Since the memory controllers are shared among all cores, we observe severe memory contention between them. The CPU applications suffer a 1.26x slowdown and 64.79% blocked time in main memory when they run parallels with GPGPU applications. To alleviate the contention and provide more memory band-width, shared last level caches (LLCs) are commonly employed in such systems. We test a banked shared LLC structure that implanted into the co-simulator. We show that a simple shared LLC contributes mostly to the GPGPU (2.13x to running alone and 1.7x to running in parallel), rather than CPU. With the help of LLC, the memory requests issued to main memory is reduced to 30.74%, the blocked time is reduced to 49.64%, which provides more memory bandwidth. The latency-sensitive CPU applications are suffered as the LLC buffer occupation is very high when they run with GPGPU in parallel. Besides, as the number of LLC cache bank grows, we reveal that CPU achieves higher speedup than GPGPUs by increasing LLC parallelism. Finally, we also discuss the impact of GPGPU L2 cache. And we find that fewer GPGPU L2 cache banks will lower the performance as they limits the parallelism of GPGPU. The observations and inferences in this paper may serve as a reference guide to future CPU-GPGPU shared LLC design.
机译:如今,GPGPU与CPU之间的数据交换变得越来越重要。减轻长等待时间的行业趋势是将CPU和GPGPU集成在单个芯片上。在本文中,我们使用将gem5和gpgpu-sim集成在一起的CPU-GPGPU协同仿真器来分析CPU和GPGPU应用程序之间的参考交互。由于内存控制器在所有内核之间共享,因此我们观察到它们之间存在严重的内存争用。与GPGPU应用程序并行运行时,CPU应用程序的主存储器内存下降1.26倍,阻塞时间为64.79%。为了减轻竞争并提供更多的内存带宽,在此类系统中通常使用共享的最后一级缓存(LLC)。我们测试了植入共享仿真器中的银行共享LLC结构。我们展示了一个简单的共享LLC对GPGPU的贡献最大(单独运行时为2.13倍,并行运行时为1.7倍),而不是CPU。借助LLC,发给主内存的内存请求减少到30.74%,阻塞时间减少到49.64%,从而提供更多的内存带宽。当与GPGPU并行运行时,LLC缓冲区占用非常高,因此对延迟敏感的CPU应用程序会受到影响。此外,随着LLC缓存库数量的增加,我们发现CPU通过提高LLC并行度实现了比GPGPU更高的提速。最后,我们还讨论了GPGPU L2缓存的影响。而且我们发现,较少的GPGPU L2高速缓存存储区会限制GPGPU的并行性,从而降低性能。本文中的观察和推论可作为将来CPU-GPGPU共享LLC设计的参考指南。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号