Recent trends in computational architecture design are yielding processors with deep andcomplex memory hierarchies consisting of small capacity caches and large capacity mainmemory. CPU parallelism is also hierarchical, consisting of SIMD vector units containedwithin multiple computational cores with one or more packages in a multi-socket system.Solving the deterministic discrete ordinates transport equation effectively on thesearchitectures requires extracting and effectively mapping concurrent work to the processingelements to leverage performance close to the maximum attainable. This challengebecomes more acute when an unstructured spatial domain is required, where the sweepdependency between neighbouring spatial cells/elements is not implicit as for a structuredgrid. In this paper we introduce the transport community to the UnSNAP mini-app,a port of the well known SNAP proxy application. UnSNAP was developed to investigatethe performance of arbitrarily high-order discontinuous Galerkin finite element unstructureddeterministic transport codes on advanced architectures. Approaches to local matrixassembly and solution are evaluated in order to assess their performance for different elementorders, and discuss the trade-offs with respect to performance and memory capacitylimits of advanced architectures. The performance limiting factors will be explored onmany-core architectures, including CPUs from Intel, AMD and Marvell (Arm). We willalso discuss performing unstructured sweeps on GPU devices highlighting the associatedchallenges.
展开▼