首页> 外文期刊>Parallel Computing >Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms
【24h】

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

机译:新兴的多核和多核平台上的运动动力学细胞内颗粒优化

获取原文
获取原文并翻译 | 示例

摘要

The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this work, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC's key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3-4.7 x on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.
机译:未来十年,随着电源和散热限制限制微处理器时钟速度的提高,高性能计算(HPC)系统将在多核和多核架构中快速发展和发展。在要求苛刻的数值方法的背景下,了解各种多核设计的有效优化方法是HPC社区当今面临的最大挑战之一。在这项工作中,我们研究了GTC的有效多核优化,GTC是研究托卡马克设备中的等离子体微湍流的皮秒级陀螺动力学融合代码。对于GTC的关键计算组件(电荷沉积和粒子推动),我们探索了范围广泛的新兴多核设计中的有效并行化策略,包括最近发布的Intel Nehalem-EX,AMD Opteron Istanbul和高度多线程的Sun UltraSparc T2 +。我们还展示了使用NVIDIA C2050(Fermi)进行的针对图形处理器的陀螺动力学粒子内(PIC)算法调整的第一项研究。我们的工作讨论了几种用于动能PIC的新颖优化方法,包括混合精度计算,粒子合并和分解策略,网格复制,SIMD原子浮点运算以及有效的GPU纹理内存利用率。总体而言,尽管存在数据依赖性和局部性的固有挑战,但在这些复杂的PIC内核上,我们仍实现了1.3-4.7倍的显着性能提升。我们的工作还指出了一些体系结构和编程功能,可以显着提高下一代体系结构上的PIC性能和生产率。

著录项

  • 来源
    《Parallel Computing》 |2011年第9期|p.501-520|共20页
  • 作者单位

    Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States;

    School of Computer Science, Kookmin University, Seoul 136-702, Republic of Korea;

    Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States;

    Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States;

    Princeton Plasma Physics Laboratory, Princeton, NJ 08543, United States;

    Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Particle-in-cell; Multicore; Manycore; Code optimization; Graphic processing units; Fermi;

    机译:细胞内颗粒多核;Manycore;代码优化;图形处理单元;费米;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号