首页> 外文期刊>Journal of Parallel and Distributed Computing >Fine-grained Parallelization Of Lattice Qcd Kernel Routine On Gpus
【24h】

Fine-grained Parallelization Of Lattice Qcd Kernel Routine On Gpus

机译:Gpus上晶格Qcd内核例程的细粒度并行化

获取原文
获取原文并翻译 | 示例

摘要

Simulation time for the classical problem of Lattice Quantum Chromodynamics (Lattice QCD) is dominated by one kernel routine responsible for computing the actions of a Dirac operator. This paper describes an experience in parallelizing this kernel routine. We explore parallelization granularities for this kernel routine on Graphical Processing Units (GPUs). We show that fine-grained parallelism can outperform coarse-grained parallelization, given that control-flow and communication effects are minimized. We propose two techniques for transforming control-flow-based code to control-free code. We also show how to reduce the communication effect by optimizing for commonly used sequences of calls to this routine. In our implementation on NVIDIA 8800 GTX, we were able to achieve an 8.3x speedup over an SSE2 optimized version on 2.8 GHz Intel Xeon CPU.
机译:莱迪思量子色动力学经典问题(莱迪思QCD)的仿真时间由一个负责计算Dirac算子作用的内核例程控制。本文介绍了并行化此内核例程的经验。我们在图形处理单元(GPU)上探索此内核例程的并行化粒度。我们表明,如果控制流和通信效果最小化,则细粒度的并行性可以胜过粗粒度的并行化。我们提出了两种将基于控制流的代码转换为无控制代码的技术。我们还将展示如何通过优化此例程的常用调用顺序来减少通信影响。在NVIDIA 8800 GTX上实施时,我们能够在2.8 GHz Intel Xeon CPU上的SSE2优化版本上实现8.3倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号