...
首页> 外文期刊>Journal of supercomputing >Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters
【24h】

Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters

机译:多核集群上的高阶模版计算的分层并行化和优化

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We present a scalable parallelization scheme for high-order stencil computations that also optimizes memory behavior on multicore clusters. Our multilevel approach combines: (ⅰ) inter-node parallelization via spatial decomposition; (ⅱ) inter-core parallelization via multithreading and explicit non-uniform memory access (NUMA) control; (ⅲ) data locality optimizations through auto-tuned tiling for efficient use of hierarchical memory; and (ⅳ) register blocking and data parallelism via single-instruction multiple-data techniques to utilize registers and exploit data locality. The scheme is applied to a sixth-order stencil based finite-difference time-domain code. Weak-scaling parallel efficiency is over 98 % on 32,768 BlueGene/P processors. Multithreading with explicit NUMA control attains 9.9-fold speedup on a dual 12-core AMD Opteron system. Data locality optimizations achieve 7.7-fold reduction of the last level cache miss rate of Intel Nehalem, whereas register blocking increases data parallelism and thereby achieves 5.9 Gflops performance on a single core. Register blocking + multithreading optimizations achieve 5.8-fold speedup on a single quadcore Nehalem.
机译:我们为高阶模版计算提供了一种可扩展的并行化方案,该方案还优化了多核群集上的内存行为。我们的多级方法结合了:(ⅰ)通过空间分解的节点间并行化; (ⅱ)通过多线程和显式非均匀内存访问(NUMA)控制进行内核间并行化; (ⅲ)通过自动调整切片来优化数据位置,以有效使用分层内存; (ⅳ)通过单指令多数据技术利用寄存器阻塞和数据并行性来利用寄存器并利用数据局部性。该方案被应用于基于六阶模版的有限差分时域码。在32,768台BlueGene / P处理器上,微扩展并行效率超过98%。具有显式NUMA控件的多线程在双12核AMD Opteron系统上实现了9.9倍的加速。数据局部性优化使Intel Nehalem的最后一级高速缓存未命中率降低了7.7倍,而寄存器分块提高了数据并行度,从而在单核上实现了5.9 Gflops的性能。寄存器阻止+多线程优化在单个四核Nehalem上实现了5.8倍的加速。

著录项

  • 来源
    《Journal of supercomputing》 |2012年第2期|p.946-966|共21页
  • 作者单位

    Collaboratory for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, University of Southern California, Los Angeles, CA 90089, USA;

    Collaboratory for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, University of Southern California, Los Angeles, CA 90089, USA;

    Collaboratory for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, University of Southern California, Los Angeles, CA 90089, USA;

    Information Sciences Institute, University of Southern California, Suite 1001, 4676 Admiralty Way, Marina del Rey, CA 90292, USA;

    Information Sciences Institute, University of Southern California, Suite 1001, 4676 Admiralty Way, Marina del Rey, CA 90292, USA;

    School of Computing, University of Utah, Salt Lake City, UT 84112, USA;

    School of Computing, University of Utah, Salt Lake City, UT 84112, USA;

    Collaboratory for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, University of Southern California, Los Angeles, CA 90089, USA;

    Collaboratory for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, University of Southern California, Los Angeles, CA 90089, USA;

    Collaboratory for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, University of Southern California, Los Angeles, CA 90089, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    stencil computation; PDE solvers; finite differences; structured grid; NUME; blocking; multithreading; single instruction multiple data parallelism; message passing; spatial decomposition;

    机译:模具计算;PDE求解器;有限的差异;结构化网格;NUME;阻塞多线程单指令多数据并行;信息传递空间分解;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号