Performance Tuning of Parallel Fragment Molecular OrbitalProgram (OpenFMO) for Effective Execution on K-computer

稲富 雄一; 眞木 淳; 本田 宏明; 高見 利也; 小林 泰三; 青柳 睦; 南 一生

首页> 外文期刊>Journal of computer chemistry >Performance Tuning of Parallel Fragment Molecular OrbitalProgram (OpenFMO) for Effective Execution on K-computer

【24h】

Performance Tuning of Parallel Fragment Molecular OrbitalProgram (OpenFMO) for Effective Execution on K-computer

机译：在K计算机上有效执行并行片段分子轨道程序（OpenFMO）的性能调整

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The performance tuning of parallel fragment molecular orbital (FMO) program (OpenFMO) was done to carry out massively parallel FMO calculations effectively on K computer, which is one of the fastest super computers in the world. In this tuning, we focused on the load-balancing of each small-scale molecular orbital calculation for monomer and dimer. To maintain the load-balance for each process, we used the dynamic load-balancing technique with the global counter, and the global counter was implemented using a de facto standard parallelization library such as MPI and OpenMP to keep the portability of our code.In our implementation of the global counter, one thread in each group is used as the master thread of global counter which doesn't calculate molecular integrals, it is required that thread support of MPI_THREAD_SERIALIZED level, and three kinds of codes be provided depending on the kind of the thread as shown in Figure 3, Figure 4 and Figure 5.As a result of applying the dynamic load-balancing using our global counter, the load of molecular integral calculation for each process was well-balanced in each small-scale calculation (see Figure 7 lower), and the parallelization efficiency of the molecular integral part became very high (94% in 256 parallel execution, see Figure 8, "molecular integral part"). On the other hand, it was observed that the parallelization efficiency of the SCF part was so bad, that it caused efficiency lowering of calculations of the monomer electronic structure (see Figure 8). The results of large-scale performance evaluation showed that high efficiency (93%) of coarse grained parallelization was achieved in 20480 parallel executions using the Intel Xeon PC cluster (see Figure 8 and Figure 9) and the elapsed time of the FMO calculation for a large molecule (16,764 atoms) was only 30 min.

机译：进行了并行碎片分子轨道（FMO）程序（OpenFMO）的性能调整，以在世界上最快的超级计算机之一的K计算机上有效地进行大规模并行FMO计算。在此调整中，我们专注于单体和二聚体的每个小分子分子轨道计算的负载平衡。为了保持每个进程的负载平衡，我们将动态负载平衡技术与全局计数器配合使用，并使用事实上的标准并行化库（例如MPI和OpenMP）来实现全局计数器，以保持代码的可移植性。在我们实现全局计数器的过程中，每组中的一个线程用作全局计数器的主线程，该线程不计算分子积分，要求提供MPI_THREAD_SERIALIZED级的线程支持，并根据类型提供三种代码如图3，图4和图5所示。使用全局计数器应用动态负载平衡的结果是，在每个小规模计算中，每个过程的分子积分计算的负载都得到了很好的平衡（参见下面的图7），分子积分部分的平行化效率非常高（在256个并行执行中为94％，请参见图8，“分子积分部分”）。另一方面，观察到SCF部分的平行化效率很差，以致导致单体电子结构的计算效率降低（见图8）。大规模性能评估的结果表明，使用Intel Xeon PC集群（参见图8和图9），在20480个并行执行中实现了高效率（93％）的粗粒度并行化（针对图8和图9），FMO计算的耗时为一个大分子（16,764个原子）仅需30分钟。

著录项

来源
《Journal of computer chemistry》 |2013年第2期|共11页
作者
稲富雄一; 眞木淳; 本田宏明; 高見利也; 小林泰三; 青柳睦; 南一生;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类化学;
关键词

相似文献

外文文献
中文文献
专利

1. Tuning parallel symbolic execution engine for better performance [J] . Anil Kumar KARNA, Jinbo DU, Haihao SHEN, Frontiers of computer science in China . 2018,第1期

机译：调整并行符号执行引擎以获得更好的性能
2. Effective fragment molecular orbital method: A merger of the effective fragment potential and fragment molecular orbital methods [J] . Steinmann C., Fedorov D.G., Jensen J.H. The journal of physical chemistry, A. Molecules, spectroscopy, kinetics, environment, & general theory . 2010,第33期

机译：有效片段分子轨道方法：有效片段势和片段分子轨道方法的合并
3. Parallelization and performance tuning of molecular dynamics code with OpenMP [J] . BAI Shu-ren, RAN Li-ping, LU Kui-lin Journal of Central South University of Technology . 2006,第3期

机译：使用OpenMP对分子动力学代码进行并行化和性能调整
4. Increased Efficiency of Parallel Calculations of Fragment Molecular Orbitals by Using Fine-Grained Parallelization on a HITACHI SR8000 Supercomputer [C] . Yuichi Inadomi, Tatsuya Nakano, Kazuo Kitaura, International Conference on High-Performance Computing and Networking . 2001

机译：通过在Hitachi SR8000超级计算机上使用细粒度并行化平行计算碎片分子轨道的平行计算效率
5. A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach. [D] . Mustafa, Dheya G. 2013

机译：并行应用程序和并行体系结构的性能评估，调整和分析的过程：定量方法。
6. Parallel MapReduce: Maximizing Cloud Resource Utilization and Performance Improvement Using Parallel Execution Strategies [O] . Ahmed Abdulhakim Al-Absi, Najeeb Abbas Al-Sammarraie, Wael Mohamed Shaher Yafooz, -1

机译：并行MapReduce：使用并行执行策略来最大程度地利用云资源并提高性能
7. Performance Tuning of Parallel Fragment Molecular OrbitalProgram (OpenFMO) for Effective Execution on K-computer [O] . Yuichi INADOMI, Jun MAKI, Hiroaki HONDA, 2013

机译：平行片段分子轨道的性能调整（OpenFMO）在K-Computer上有效执行

Performance Tuning of Parallel Fragment Molecular OrbitalProgram (OpenFMO) for Effective Execution on K-computer

摘要

著录项

相似文献

相关主题

期刊订阅