首页> 外文期刊>Computer physics communications >Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications
【24h】

Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

机译:多GPU和多CPU加速FDTD方案用于vibro声学应用

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bidimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications. (C) 2015 Elsevier B.V. All rights reserved.
机译:有限差分时间域(FDTD)方法应用于偶联问题的分析,并研究分层介质中的纵向和横向波的传播。在这项工作中,证明了该方案的潜力和每个加速策略对FDTD中大规模计算的相关性。在本文中,我们分别提出了使用多CPU和多GPU的FDTD方法的两种新的特定实施方法。在第一实现中,已包括开源消息传递接口(OMPI),以便大量利用两个英特尔Xeon处理器的Biprocessor站的资源。此外,关于CPU代码版本,已将流SIMD扩展(SSE)以及高级向量扩展(AVX)包含在利用多核平台的共享存储器方法中。另一方面,称为多GPU代码版本的第二实施方式基于两个GPU上的CUDA中可用的对等通信(NVIDIA GTX 670)。随后,本文提出了对包括共享存储器方法,矢量指令和多处理器(CPU和GPU)的不同代码版本的影响的准确分析,并将其比较,以便根据基于的使用分布式解决方案的改进程度多CPU和多GPU。分析了两种方法的性能,已经证明了对CPU计算的共享存储器方案的添加提高了矢量指令的性能,其放大了有效地使用CPU的高速缓存存储器的模拟大小。在这种情况下,GPU计算比两个节点中的精细调谐CPU版本略速度稍微略微两次。然而,对于大规模计算,显式矢量指令不值得,由于存储器带宽是限制因素,并且性能趋于与具有自动矢量的顺序版本相同并且也是共享的存储方法。在这种情况下,GPU计算是最好的选择,因为它提供了同类行为。更具体地,GPU计算的加速实现了一个和两个GPU的12的上限,而性能分别达到80 gflops和146 gflops的峰值,分别用于一个GPU和两个GPU的性能。最后,该方法应用于地壳配置文件,以展示我们方法的潜力以及在这些类型的应用中应用加速策略的必要性。 (c)2015 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Computer physics communications》 |2015年第null期|共9页
  • 作者单位

    Univ Alicante Dpto Fis Ingn Sistemas &

    Teoria Senal E-03080 Alicante Spain;

    Univ Politecn Cataluna Dept Arquitectura Comp Barcelona Spain;

    Univ Alicante Dpto Fis Ingn Sistemas &

    Teoria Senal E-03080 Alicante Spain;

    Univ Alicante Dpto Fis Ingn Sistemas &

    Teoria Senal E-03080 Alicante Spain;

    Univ Alicante Dpto Fis Ingn Sistemas &

    Teoria Senal E-03080 Alicante Spain;

    Univ Alicante Dpto Fis Ingn Sistemas &

    Teoria Senal E-03080 Alicante Spain;

    Univ Alicante Dpto Fis Ingn Sistemas &

    Teoria Senal E-03080 Alicante Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机的应用;
  • 关键词

    FDTD; GPU computing; OMPI; SIMD extensions;

    机译:FDTD;GPU计算;OMPI;SIMD扩展;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号