We present an implementation of AMG with simple aggregation techniques on multiple GPUs. It supports the parallel matrix representations typically used for finite volume discretisation. We employ the ICRS sparse matrix format and the asynchronous exchange mechanism of MPI on CPUs that has been modified to make it suitable for the GPU coprocessors. We show that the solution phase of the standard v-cycle AMG with simple aggregation is accelerated by a factor of up to 12. The solution phase of the more advanced Krylov-accelerated AMG runs faster by a factor of up to 7 on Nvidia TESLA C2070 compared to calculation on Intel X5650 CPUs.
展开▼
机译:我们介绍了在多个GPU上使用简单聚合技术实现的AMG。它支持通常用于有限体积离散化的并行矩阵表示。我们采用了经过修改的ICRS稀疏矩阵格式和MPI在CPU上的异步交换机制,使其适用于GPU协处理器。我们显示,具有简单聚合的标准v周期AMG的求解阶段最多可提高12倍。在Nvidia TESLA C2070上,更先进的Krylov加速的AMG的求解阶段可最高达7倍。与Intel X5650 CPU上的计算相比。
展开▼