首页> 外文会议>2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum >Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code
【24h】

Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code

机译:RAxML系统发育代码的混合MPI / Pthreads并行化

获取原文

摘要

Abstract-A hybrid MPI/Pthreads parallelization was implemented in the RAxML phylogenetics code. New MPI code was added to the existing Pthreads production code to exploit parallelism at two algorithmic levels simultaneously: coarse-grained with MPI and fine-grained with Pthreads. This hybrid, multi-grained approach is well suited for current high-performance computers, which typically are clusters of multicore, shared-memory nodes. The hybrid version of RAxML is especially useful for a comprehensive phylogenetic analysis, i.e., execution of many rapid bootstraps followed by a full maximum likelihood search. Multiple multi-core nodes can be used in a single run to speed up the computation and, hence, reduce the turnaround time. The hybrid code also allows more efficient utilization of a given number of processor cores. Moreover, it often returns a better solution than the stand-alone Pthreads code, because additional maximum likelihood searches are conducted in parallel using MPI. The comprehensive analysis algorithm involves four stages, in which coarse-grained parallelism continually decreases from stage to stage. The first three stages speed up well with MPI, while the last stage speeds up only with Pthreads. This leads to a tradeoff in effectiveness between MPI and Pthreads parallelization. The useful number of MPI processes increases with the number of bootstraps performed, but typically is limited to 10 or 20 by the parameters of the algorithm. The optimal number of Pthreads increases with the number of distinct patterns in the columns of the multiple sequence alignment, but is limited to the number of cores per node of the computer being used. For a benchmark problem with 218 taxa, 1,846 patterns, and 100 bootstraps run on the Dash computer at SDSC, the speedup of the hybrid code on 10 nodes (80 cores) was 6.5 compared to the Pthreads-only code on one node (8 cores) and 35 compared to the serial code. This run used 10 MPI processes with 8 Pthreads each. For an--other problem with 125 taxa, 19,436 patterns, and 100 bootstraps, the speedup on the Triton PDAF computer at SDSC was 38 on two nodes (64 cores) compared to the serial code. This run used 2 MPI processes with 32 Pthreads each.
机译:摘要-在RAxML系统发育代码中实现了混合MPI / Pthreads并行化。新的MPI代码已添加到现有的Pthreads生产代码中,以同时在两个算法级别上利用并行性:MPI粗粒度和Pthreads细粒度。这种混合的,多粒度的方法非常适合当前的高性能计算机,这些计算机通常是多核共享内存节点的群集。 RAxML的混合版本对于全面的系统发育分析(即执行许多快速引导程序,然后执行完整的最大似然搜索)特别有用。可以在一次运行中使用多个多核节点来加快计算速度,从而减少周转时间。混合代码还允许更有效地利用给定数量的处理器内核。而且,它通常比独立的Pthreads代码返回更好的解决方案,因为使用MPI并行执行了其他最大似然搜索。综合分析算法涉及四个阶段,其中,粗粒度并行度在每个阶段不断降低。前三个阶段使用MPI可以很好地加速,而最后阶段仅使用Pthreads可以加速。这导致在MPI和Pthreads并行化之间的有效性之间进行权衡。 MPI处理的有用数量随执行的引导程序数量而增加,但通常受算法参数限制为10或20。 Pthread的最佳数量随多序列比对列中不同模式的数量而增加,但受限于所使用计算机的每个节点的核心数量。对于在SDSC的Dash计算机上运行218个分类单元,1,846个模式和100个引导程序的基准问题,与在一个节点(8个内核)上仅使用Pthreads的代码相比,在10个节点(80个内核)上的混合代码的速度为6.5。 )和35(与序列号相比)。该运行使用了10个MPI进程,每个进程有8个Pthread。为- -- 另一个问题是有125个分类单元,19,436个模式和100个引导程序,与串行代码相比,在SDSC上的Triton PDAF计算机上,在两个节点(64核)上的加速为38。该运行使用了2个MPI进程,每个进程具有32个Pthread。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号