首页> 外文期刊>International journal of parallel programming >RT-CUDA: A Software Tool for CUDA Code Restructuring
【24h】

RT-CUDA: A Software Tool for CUDA Code Restructuring

机译:RT-CUDA:用于CUDA代码重组的软件工具

获取原文
获取原文并翻译 | 示例

摘要

Recent development in graphic processing units (GPUs) has opened a new challenge in harnessing their computing power as a new general purpose computing paradigm. However, porting applications to CUDA remains a challenge to average programmers, which have to package code in separate functions, explicitly manage data transfers between the host and device memories, and manually optimize GPU memory utilization. In this paper, we propose a restructuring tool (RT-CUDA) that takes a C-like program and some user directives as compiler hints to produce an optimized CUDA code. The tool strategy is based on efficient management of the memory system to minimize data motion by managing the transfer between host and device, maximizing bandwidth for device memory accesses, and enhancing data locality and re-use of cached data using shared-memory and registers. Enhanced resource utilization is implemented by re-writing code as parametric kernels and use of efficient auto-tuning. The tool enables calling numerical libraries (CuBLAS, CuSPARSE, etc.) to help implement applications in science simulation like iterative linear algebra solvers. For the above applications, the tool implement an inter-block global synchronization which allow the execution overall among a few iterations which is helpful to balance load and to avoid polling. Evaluation of RT-CUDA has been performed using a variety of basic linear algebra operators (Madd, MM, MV, VV, etc.) as well as the programming of iterative solvers for systems of linear equations like Jacobi and Conjugate Gradient algorithms. Significant speedup has been achieved over other compilers like PGI OpenACC and GPGPU compilers for the above applications. Evaluation shows that generated kernels efficiently call math libraries and enable implementing complete iterative solvers. The tool help scientists developing parallel simulators like reservoir simulators, molecular dynamics, etc. without exposing to complexity of GPU and CUDA programming. We have partnership with a group of researchers at the Saudi Aramco, a national company in Saudi Arabia. RT-CUDA is currently explored as a potential development tool for applications involving linear algebra solvers by the above group. In addition, RT-CUDA is being used by Senior and Graduate students at King Fahd University of Petroleum and Minerals in their projects as part of RT-CUDA continuous enhancement.
机译:图形处理单元(GPU)的最新发展在利用其计算能力作为新的通用计算范式方面提出了新的挑战。但是,将应用程序移植到CUDA仍然是普通程序员的挑战,普通程序员必须将代码打包在单独的函数中,显式管理主机和设备内存之间的数据传输,并手动优化GPU内存利用率。在本文中,我们提出了一种重组工具(RT-CUDA),该工具采用类似C的程序和一些用户指令作为编译器提示,以生成优化的CUDA代码。该工具策略基于对内存系统的高效管理,以通过管理主机与设备之间的传输,最大程度地提高设备内存访问的带宽,增强数据局部性以及使用共享内存和寄存器来缓存数据的重用,来最大程度地减少数据移动。通过将代码重写为参数内核并使用有效的自动调整功能,可以提高资源利用率。该工具可以调用数值库(CuBLAS,CuSPARSE等)来帮助实现科学仿真中的应用,例如迭代线性代数求解器。对于上述应用程序,该工具实现了块间全局同步,该全局同步允许整体执行几次迭代,这有助于平衡负载并避免轮询。已使用各种基本的线性代数算子(Madd,MM,MV,VV等)以及诸如Jacobi和共轭梯度算法之类的线性方程组的迭代求解器编程,对RT-CUDA进行了评估。与上述应用程序的其他编译器(例如PGI OpenACC和GPGPU编译器)相比,已经实现了显着的加速。评估表明,生成的内核可以有效地调用数学库,并可以实现完整的迭代求解器。该工具可帮助科学家开发并行模拟器,例如储层模拟器,分子动力学等,而无需暴露GPU和CUDA编程的复杂性。我们与沙特阿拉伯的国营公司Saudi Aramco的一组研究人员建立了合作伙伴关系。目前,RT-CUDA被上述小组视为涉及线性代数求解器的应用程序的潜在开发工具。另外,法赫德国王石油与矿产大学的高年级和研究生正在使用RT-CUDA作为RT-CUDA持续改进的一部分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号