Scientific software developers are facing the increasing challenge of diverse parallel hardware, as represented by large Linux clusters of multi-core CPUs, potentially enhanced with many-core accelerators from AMD, Intel and Nvidia. It is not clear which approach will be successful in the future, and thus scientific codes have to consider how to be able to efficiently exploit any and all of these solutions. On top of this, problem decomposition over an MPI backed cluster, along with more advanced high level optimizations (e.g tiling, efficient halo exchange, etc.) is an aspect of modern scientific software development that has been repeated unnecessarily across many codes. To this end, a domain-specific language (DSL) has been proposed and largely implemented, along with a simple Lattice-Boltzmann D3Q19 example. Results are presented for scaling on Piz Daint, as well as direct performance comparison of a range of the latest GPUs and many-core devices from AMD, Intel and Nvidia.
展开▼