Optimizing tensor contraction expressions for hybrid CPU-GPU execution

Ma W.; Krishnamoorthy S.; Villa O.; Kowalski K.; Agrawal G.

首页> 外文期刊>Cluster computing >Optimizing tensor contraction expressions for hybrid CPU-GPU execution

【24h】

Optimizing tensor contraction expressions for hybrid CPU-GPU execution

机译：优化张量收缩表达式以执行混合CPU-GPU

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on Graphics Processing Units (GPUs) requires several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. Moreover, to apply the same optimizations to various expressions, we need a code generation tool. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU. To evaluate our tool, GPU-enabled code is generated for the most expensive contractions in CCSD(T), a key coupled cluster method, and incorporated into NWChem, a popular computational chemistry suite. For this method, we demonstrate speedup over a factor of 8. 4 using one GPU as compared to one CPU core and over 2. 6 when utilizing the entire system using hybrid CPU+GPU solution with 2 GPUs and 5 cores (instead of 7 cores per node). We further investigate tensor contraction code on a new series of GPUs, the Fermi GPUs, and provide several effective optimization algorithms. For the same computation of CCSD(T), on a cluster with Fermi GPUs, we achieve a speedup of 3. 4 over a cluster with T10 GPUs. With a single Fermi GPU on each node, we achieve a speedup of 43 over the sequential CPU version.

机译：张量收缩是广义的多维矩阵乘法运算，广泛地发生在量子化学中。要在图形处理单元（GPU）上有效执行张量收缩，需要解决一些挑战，包括索引置换和较小的尺寸大小，从而降低线程块的利用率。此外，要将相同的优化应用于各种表达式，我们需要一个代码生成工具。在本文中，我们介绍了自动生成CUDA代码以在GPU上执行张量收缩的方法，包括管理CPU和GPU之间的数据移动。为了评估我们的工具，会为关键耦合簇方法CCSD（T）中最昂贵的收缩生成启用GPU的代码，并将其合并到流行的计算化学套件NWChem中。对于此方法，我们展示了使用一个GPU的速度比使用一个CPU内核的速度提高了8倍。使用2个GPU和5个内核（而不是7个内核）的混合CPU + GPU解决方案来利用整个系统时，速度提高了2. 6倍。每个节点）。我们进一步研究了一系列新的GPU（即Fermi GPU）上的张量收缩代码，并提供了几种有效的优化算法。对于相同的CCSD（T）计算，在具有Fermi GPU的集群上，与具有T10 GPU的集群相比，我们实现了3. 4的加速。在每个节点上只有一个Fermi GPU，与顺序CPU版本相比，我们的速度提高了43倍。

著录项

来源
《Cluster computing》 |2013年第1期|共25页
作者
Ma W.; Krishnamoorthy S.; Villa O.; Kowalski K.; Agrawal G.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
CUDA; Hybrid CPU+GPU execution; Tensor Contraction Expressions;

机译：CUDA;混合CPU + GPU执行;张量收缩表达式;

相似文献

外文文献
中文文献
专利

1. Optimizing tensor contraction expressions for hybrid CPU-GPU execution [J] . Ma W., Krishnamoorthy S., Villa O., Cluster computing . 2013,第1期

机译：优化张量收缩表达式以执行混合CPU-GPU
2. Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions [J] . Qingda Lu, Xiaoyang Gao, Sriram Krishnamoorthy, Journal of Parallel and Distributed Computing . 2012,第3期

机译：基于经验性能模型的数据布局优化和张量收缩表达式的库调用选择
3. Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry [J] . Albert Hartono, Qingda Lu, Thomas Henretty, The journal of physical chemistry, A. Molecules, spectroscopy, kinetics, environment, & general theory . 2009,第45期

机译：量子化学中多体方法的张量压缩表达式的性能优化
4. Effective Utilization of Tensor Symmetry in Operation Optimization of Tensor Contraction Expressions [C] . Pai-Wei Lai, Huaijian.Zhang, Samyam Rajbhandari, International Conference on Computational Science . 2013

机译：张量收缩表达式运行优化中的张量对称性的有效利用
5. A Framework for Performance Optimization of Tensor Contraction Expressions. [D] . Lai, Pai-Wei. 2014

机译：张量收缩表达式性能优化的框架。
6. Optimizing Hybrid Occlusion in Face-Jaw-Teeth Transplantation: A Preliminary Assessment of Real-Time Cephalometry as Part of the Computer-Assisted Planning and Execution Workstation for Craniomaxillofacial Surgery [O] . Ryan J. Murphy, Ehsan Basafa, Sepehr Hashemi, -1

机译：优化面颚牙齿移植中的混合咬合：实时头颅测量的初步评估作为颅颌面外科手术计算机辅助计划和执行工作站的一部分
7. Effective Utilization of Tensor Symmetry in Operation Optimization of Tensor Contraction Expressions [O] . Lai Pai-Wei, Zhang Huaijian, Rajbhandari Samyam, 2012

机译：张量对称性在优化张量收缩表达式的操作中的有效利用

Optimizing tensor contraction expressions for hybrid CPU-GPU execution

摘要

著录项

相似文献

相关主题

期刊订阅