Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA

Naoki Takada; Tomoyoshi Shimobaba; Nobuyuki Masuda; Tomoyoshi Ito

首页> 外文期刊>Applied Computational Electromagnetics Society journal >Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA

【24h】

Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA

机译：使用构造为带有CUDA的二维数组的线程块提高了FDTD计算的性能

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a previous study, the authors proposed an finite-difference time-domain (FDTD) implementation for a compute unified device architecture (CUDA) compatible graphics processing unit (GPU) using a thread block constructed as a two-dimensional (2-D) array. However, it was found that the larger the computational domain of the 2-D FDTD simulation using the GPU, the slower the computational speed. In the present paper, the authors investigated the computational performance with respect to the size of a thread block constructed as a 2-D array, and improved the performance of the implementation. Finally, regardless of the size of computational domain, the computational speed using a single GPU (NVIDIA GeForce GTX 280) achieved approximately 30.0 Gflops, which was approximately 20 times faster than that of a single core of a central processing unit (Intel 3.0-GHz Core 2 Duo). The improved performance was approximately 65% of the theoretical peak performance (47.23 Gflops) obtained by the theoretical memory bandwidth (141.7 GB/s).

机译：在先前的研究中，作者提出了一种有限差分时域（FDTD）实现，用于使用构造为二维（2-D）的线程块的计算统一设备体系结构（CUDA）兼容图形处理单元（GPU）。数组。然而，发现使用GPU的2-D FDTD仿真的计算域越大，则计算速度越慢。在本文中，作者研究了关于构造为二维数组的线程块的大小的计算性能，并提高了实现的性能。最后，无论计算域的大小如何，使用单个GPU（NVIDIA GeForce GTX 280）的计算速度均达到约30.0 Gflops，这比中央处理器（Intel 3.0-GHz）的单核速度快约20倍。 Core 2 Duo）。改进后的性能约为理论内存带宽（141.7 GB / s）获得的理论峰值性能（47.23 Gflops）的65％。

著录项

来源
《Applied Computational Electromagnetics Society journal 》 |2010年第12期| p.1061-1069| 共9页
作者
Naoki Takada; Tomoyoshi Shimobaba; Nobuyuki Masuda; Tomoyoshi Ito;
展开▼
作者单位

Department of Informatics and Media Technology, Sony Institute of Higher Education, Shohoku College, 428 Nurumizu, Atsugi, Kanagawa 243-8501, JAPAN;

rnGraduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, Chiba 263-8522, JAPAN;

rnGraduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, Chiba 263-8522, JAPAN;

rnGraduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, Chiba 263-8522, JAPAN;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
finite-difference time-domain method; GPU computing; graphics processing unit; high-performance computing;

机译：时域有限差分法GPU计算;图形处理单元;高性能计算;

相似文献

外文文献
中文文献
专利

1. Avoiding Duplicated Computation to Improve The Performance of PFSP on CUDA GPUs [J] . Chao-Chin Wu, Kai-Cheng Wei, Wei-Shen Lai, Computer Science & Information Technology . 2016 ,第7期

机译：避免重复计算，以提高PFSP对CUDA GPU的性能
2. Performance of the improved PML for the envelope ADI-FDTD method in two-dimensional domain [J] . Shu-Hai Sun, Choi C.T.M. IEEE microwave and wireless components letters . 2005 ,第11期

机译：改进的PML在二维域中用于包络ADI-FDTD方法的性能
3. Improved Architecture of FDTD/FIT Dedicated Computer for Higher Performance Computation [J] . Kawaguchi H., Fujita Y., Fujishima Y., IEEE Transactions on Magnetics . 2008 ,第6期

机译：FDTD / FIT专用计算机的改进架构，可实现更高的性能计算
4. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering [C] . Wende Florian, Cordes Frank, Steinke Thomas 2012 Symposium on Application Accelerators in High Performance Computing. . 2012

机译：关于通过内核重排序提高并发内核执行能力的多线程CUDA应用程序的性能
5. Computational electromagnetics modeling of two-dimensional man-made 90 degree wedge-type of structures using the FDTD numerical method. [D] . Demetriou, Demetrakis P. 2001

机译：使用FDTD数值方法对二维人造90度楔形结构进行计算电磁建模。
6. Two-Dimensional Hybrid Composites of SnS2 Nanosheets Array Film with Graphene for Enhanced Photoelectric Performance [O] . Feier Fang, Henan Li, Huizhen Yao, 2019

机译：SnS2纳米片阵列膜与石墨烯的二维杂化复合材料增强光电性能
7. Comparison of Computational Performance between CUDA C and CUDA Fortran in High-Speed FDTD Simulation Using GPGPU [O] . 高原勝平, 今井卓, 田口健治, 2012

机译：使用GPGPU高速FDTD仿真中CUDA C和CUDA Fortran的计算性能比较

Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA

摘要

著录项

相似文献

相关主题

期刊订阅