首页> 外文会议>IEEE SoutheastCon >A talented CPU-to-GPU memory mapping technique
【24h】

A talented CPU-to-GPU memory mapping technique

机译:出色的CPU到GPU内存映射技术

获取原文

摘要

In order to fast effective analysis of large systems, high performance computing (HPC) is essential. NVIDIA Compute Unified Device Architecture (CUDA)-assisted central processing unit (CPU) and graphics processing unit (GPU) computing platform has proven its potential to be used for HPC supports. In CPU/GPU computing, original data and instructions are copied from CPU-main-memory to GPU-global-memory. Inside GPU, it would be beneficial to keep the data into shared memory (shared only by the threads of that block) than in the global memory (shared by all threads). However, GPU shared memory is much smaller than GPU global memory (for Fermi Tesla C2075, total shared memory per block is 48 KB and total global memory is 5.6 GB). In this paper, we introduce a CPU-main-memory to GPU-global-memory mapping technique to improve the GPU/overall system performance by increasing the effectiveness of GPU shared memory. Experimental results, from solving Laplace's equation for 512×512 matrix using Fermi and Kepler cards, show that proposed CPU-to-GPU memory mapping technique help decrease the overall execution time by more than 75%.
机译:为了快速有效地分析大型系统,高性能计算(HPC)是必不可少的。 NVIDIA Compute Unified设备体系结构(CUDA)辅助的中央处理单元(CPU)和图形处理单元(GPU)计算平台已证明其潜力可用于HPC支持。在CPU / GPU计算中,原始数据和指令从CPU主内存复制到GPU全局内存。在GPU内部,将数据保留在共享内存(仅由该块的线程共享)中而不是全局内存(由所有线程共享)中将是有益的。但是,GPU共享内存比GPU全局内存小得多(对于Fermi Tesla C2075,每个块的总共享内存为48 KB,而总全局内存为5.6 GB)。在本文中,我们将CPU主内存引入到GPU全局内存映射技术中,以通过提高GPU共享内存的有效性来提高GPU /整体系统的性能。通过使用Fermi和Kepler卡求解512×512矩阵的拉普拉斯方程组的实验结果表明,提出的CPU到GPU的内存映射技术有助于将总体执行时间减少75%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号