首页> 外文OA文献 >Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi
【2h】

Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi

机译:使用加速器进行量子化学计算:将矩阵运算迁移到NVIDIA Kepler GPU和Intel Xeon Phi

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5–5.6 GB/s and 5.4–6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.
机译:现代计算机系统越来越多地包括多核通用处理器,该处理器增加了许多通过外部接口(如PCI总线)连接的专用设备或加速器。 NVIDIA Kepler图形处理单元(GPU)和Intel Phi是此类加速器的两个示例。加速器提供的峰值性能可能远高于主机处理器的峰值性能。但是,如何利用这种异构环境来处理遗留应用程序代码并不是一件容易的事。本文考虑了如何将典型量子化学计算中的矩阵运算迁移到GPU和Phi系统。在电子结构计算中,双精度通用矩阵乘法运算是地方性的,特别是包括电子相关性的方法,例如密度泛函理论,二阶微扰理论和耦合簇理论。探索了基于问题大小自动确定是使用主机还是加速器的方法,以及在加速器和/或主机上进行的计算。对于通过PCI-e进行数据传输,GPU提供了最佳的整体性能,最大数据容量为4096 MB,并且其上载和下载速率分别在5–5.6 GB / s和5.4–6.3 GB / s之间。对于平方和非平方矩阵乘法,GPU的表现均优于Phi。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号