首页> 外文期刊>Concurrency and Computation >369 Tflop/s molecular dynamics simulations on the petaflop hybrid supercomputer 'Roadrunner'
【24h】

369 Tflop/s molecular dynamics simulations on the petaflop hybrid supercomputer 'Roadrunner'

机译:在petaflop混合超级计算机“ Roadrunner”上进行369 Tflop / s分子动力学模拟

获取原文
获取原文并翻译 | 示例

摘要

We describe the implementation of a short-range parallel molecular dynamics (MD) code, SPaSM, on the heterogeneous general-purpose Roadrunner supercomputer. Each Roadrunner 'TriBlade' compute node consists of two AMD Opteron dual-core microprocessors and four IBM PowerXCell 8i enhanced Cell microprocessors (each consisting of one PPU and eight SPU cores), so that there are four MPI ranks per node, each with one Opteron and one Cell. We will briefly describe the Roadrunner architecture and some of the initial hybrid programming approaches that have been taken, focusing on the SPaSM application as a case study. An initial 'evolutionary' port, in which the existing legacy code runs with minor modifications on the Opterons and the Cells are only used to compute interatomic forces, achieves roughly a 2× speedup over the unaccelerated code. On the other hand, our 'revolutionary' implementation adopts a Cell-centric view, with data structures optimized for, and living on, the Cells. The Opterons are mainly used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementation of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), nearly 10× faster than the unaccelerated (Opteron-only) version.
机译:我们描述了异构通用Roadrunner超级计算机上短程并行分子动力学(MD)代码SPaSM的实现。每个Roadrunner'TriBlade'计算节点均由两个AMD Opteron双核微处理器和四个IBM PowerXCell 8i增强型Cell微处理器(每个由一个PPU和八个SPU内核组成)组成,因此每个节点有四个MPI等级,每个等级都有一个Opteron。和一个牢房。我们将简要介绍Roadrunner体系结构和已采用的一些初始混合编程方法,并以案例研究为重点。初始的“进化”端口(其中现有的旧代码在Opteron上进行了较小的修改即可运行,而Cell仅用于计算原子间力)比未加速的代码大约快2倍。另一方面,我们的“革命性”实现采用以单元为中心的视图,数据结构针对单元进行了优化并存在于单元中。皓龙主要用于指导等级间通信并执行大量I / O定期分析,可视化和检查点任务。在我们的标准Lennard-Jones对潜在基准的最初实施中测得的性能在整个Roadrunner系统上达到了369 Tflop / s双精度浮点性能的峰值(峰值的27.7%),比未加速时快近10倍(仅限Opteron)版本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号