...
首页> 外文期刊>Applied Computational Electromagnetics Society journal >Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA
【24h】

Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA

机译:使用构造为带有CUDA的二维数组的线程块提高了FDTD计算的性能

获取原文
获取原文并翻译 | 示例

摘要

In a previous study, the authors proposed an finite-difference time-domain (FDTD) implementation for a compute unified device architecture (CUDA) compatible graphics processing unit (GPU) using a thread block constructed as a two-dimensional (2-D) array. However, it was found that the larger the computational domain of the 2-D FDTD simulation using the GPU, the slower the computational speed. In the present paper, the authors investigated the computational performance with respect to the size of a thread block constructed as a 2-D array, and improved the performance of the implementation. Finally, regardless of the size of computational domain, the computational speed using a single GPU (NVIDIA GeForce GTX 280) achieved approximately 30.0 Gflops, which was approximately 20 times faster than that of a single core of a central processing unit (Intel 3.0-GHz Core 2 Duo). The improved performance was approximately 65% of the theoretical peak performance (47.23 Gflops) obtained by the theoretical memory bandwidth (141.7 GB/s).
机译:在先前的研究中,作者提出了一种有限差分时域(FDTD)实现,用于使用构造为二维(2-D)的线程块的计算统一设备体系结构(CUDA)兼容图形处理单元(GPU)。数组。然而,发现使用GPU的2-D FDTD仿真的计算域越大,则计算速度越慢。在本文中,作者研究了关于构造为二维数组的线程块的大小的计算性能,并提高了实现的性能。最后,无论计算域的大小如何,使用单个GPU(NVIDIA GeForce GTX 280)的计算速度均达到约30.0 Gflops,这比中央处理器(Intel 3.0-GHz)的单核速度快约20倍。 Core 2 Duo)。改进后的性能约为理论内存带宽(141.7 GB / s)获得的理论峰值性能(47.23 Gflops)的65%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号