The PIDOTS radiation transport code utilizes a spatially decomposed Integral TransportMatrix Method (ITMM) response matrix formulation within a red/black implementation ofthe Parallel Gauss-Seidel (PGS) framework to solve the SN approximation of the neutrontransport equation on 3D Cartesian meshes. The code is intended to fully utilize thecapabilities of modern, massively parallel high-performance computing (HPC) systems. Theoriginal testing of the code verified its implementation but revealed unexpected parallelperformance losses as the processor count increased and the parallelization grain sizedecreased. Further analysis of the code and of the InfiniBand based communicationinterconnect on the Falcon HPC at Idaho National Laboratory demonstrated that tightlycoupled systems of point to point communication could yield larger than expectedslowdowns on general-use HPCs. This slowing effect was further exacerbated by the sheernumber of small data-size messages that PIDOTS used in each iteration. In this work weimplemented a modified communication algorithm that significantly reduces per-iterationtime for fine-grained cases on all processor counts. This provides a 2x speedup for the mostrefined case. Additionally, evaluation of per iteration performance is used to correlatecommunication cost to sub-optimal processor allocations and fabric behavior. Our newcommunication scheme has been evaluated across a variety of HPC systems with diversearchitectures and hardware specifications. Results show that the improvements persist acrossall tested systems. This indicates that the modified communication scheme is likelyapplicable in the future to SN solvers on unstructured meshes and, more generally, to otherhighly-communicative transport codes. Our results may also inform processor schedulingstrategies for HPCs intended for massive multiprocessing.
展开▼