...
首页> 外文期刊>Journal of computer chemistry >Performance Tuning of Parallel Fragment Molecular OrbitalProgram (OpenFMO) for Effective Execution on K-computer
【24h】

Performance Tuning of Parallel Fragment Molecular OrbitalProgram (OpenFMO) for Effective Execution on K-computer

机译:在K计算机上有效执行并行片段分子轨道程序(OpenFMO)的性能调整

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The performance tuning of parallel fragment molecular orbital (FMO) program (OpenFMO) was done to carry out massively parallel FMO calculations effectively on K computer, which is one of the fastest super computers in the world. In this tuning, we focused on the load-balancing of each small-scale molecular orbital calculation for monomer and dimer. To maintain the load-balance for each process, we used the dynamic load-balancing technique with the global counter, and the global counter was implemented using a de facto standard parallelization library such as MPI and OpenMP to keep the portability of our code.In our implementation of the global counter, one thread in each group is used as the master thread of global counter which doesn't calculate molecular integrals, it is required that thread support of MPI_THREAD_SERIALIZED level, and three kinds of codes be provided depending on the kind of the thread as shown in Figure 3, Figure 4 and Figure 5.As a result of applying the dynamic load-balancing using our global counter, the load of molecular integral calculation for each process was well-balanced in each small-scale calculation (see Figure 7 lower), and the parallelization efficiency of the molecular integral part became very high (94% in 256 parallel execution, see Figure 8, "molecular integral part"). On the other hand, it was observed that the parallelization efficiency of the SCF part was so bad, that it caused efficiency lowering of calculations of the monomer electronic structure (see Figure 8). The results of large-scale performance evaluation showed that high efficiency (93%) of coarse grained parallelization was achieved in 20480 parallel executions using the Intel Xeon PC cluster (see Figure 8 and Figure 9) and the elapsed time of the FMO calculation for a large molecule (16,764 atoms) was only 30 min.
机译:进行了并行碎片分子轨道(FMO)程序(OpenFMO)的性能调整,以在世界上最快的超级计算机之一的K计算机上有效地进行大规模并行FMO计算。在此调整中,我们专注于单体和二聚体的每个小分子分子轨道计算的负载平衡。为了保持每个进程的负载平衡,我们将动态负载平衡技术与全局计数器配合使用,并使用事实上的标准并行化库(例如MPI和OpenMP)来实现全局计数器,以保持代码的可移植性。在我们实现全局计数器的过程中,每组中的一个线程用作全局计数器的主线程,该线程不计算分子积分,要求提供MPI_THREAD_SERIALIZED级的线程支持,并根据类型提供三种代码如图3,图4和图5所示。使用全局计数器应用动态负载平衡的结果是,在每个小规模计算中,每个过程的分子积分计算的负载都得到了很好的平衡(参见下面的图7),分子积分部分的平行化效率非常高(在256个并行执行中为94%,请参见图8,“分子积分部分”)。另一方面,观察到SCF部分的平行化效率很差,以致导致单体电子结构的计算效率降低(见图8)。大规模性能评估的结果表明,使用Intel Xeon PC集群(参见图8和图9),在20480个并行执行中实现了高效率(93%)的粗粒度并行化(针对图8和图9),FMO计算的耗时为一个大分子(16,764个原子)仅需30分钟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号