首页> 外文期刊>Computing >HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs
【24h】

HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs

机译:HPMAX:异构并行矩阵使用CPU和GPU乘法

获取原文
获取原文并翻译 | 示例

摘要

We present a novel heterogeneous parallel matrix multiplication algorithm that utilizes both central processing units (CPUs) and graphics processing units (GPUs) for large-scale matrices. Based on Strassen's method, we represent matrix multiplication work as a set of matrix addition and multiplication tasks among their sub-matrices. Then, we distribute the tasks to CPUs and GPUs while considering the characteristics of the tasks and computing resources to minimize the data communication overhead and fully utilize the available computing power. To handle a large matrix efficiently with limited GPU memory, we also propose a block-based work decomposition method. We then further improve the performance of our method by exploiting the concurrent execution abilities of a heterogeneous parallel computing system. We implemented our method on five different heterogeneous systems and applied it to matrices of various sizes. Our method generally shows higher performance than the prior GPU-based matrix multiplication methods. Moreover, compared with the state-of-the-art GPU matrix multiplication library (i.e., CUBLAS), our method achieved up to 1.97 times higher performance using the same GPUs and CPU cores. In some cases, our method using a low-performance GPU (e.g., GTX 1060, 3 GB) achieved performance comparable to that of CUBLAS using a high-performance GPU (e.g., RTX 2080, 8 GB). Also, our method continually improves performance as we use more computing resources like additional CPU cores and GPUs. We could achieve such high performance because our approach fully utilized the capacities of the given heterogeneous parallel computing systems while employing the Strassen's method, which has a lower asymptotic complexity. These results demonstrate the efficiency and robustness of our algorithm.
机译:我们提出了一种新的异构并行矩阵乘法算法,其利用用于大规模矩阵的中央处理单元(CPU)和图形处理单元(GPU)。基于Strassen的方法,我们将矩阵乘法作用称为它们子矩阵中的一组矩阵加法和乘法任务。然后,我们在考虑任务和计算资源的特征时将任务分发到CPU和GPU,以最小化数据通信开销并充分利用可用的计算能力。为了高效地处理大型矩阵,我们还提出了一种基于块的工作分解方法。然后,我们通过利用异构并行计算系统的并发执行能力,进一步提高我们的方法的性能。我们在五种不同的异构系统上实施了我们的方法,并将其应用于各种尺寸的矩阵。我们的方法通常显示出比先前的基于GPU的矩阵乘法方法更高的性能。此外,与最先进的GPU矩阵乘法库(即CUBLA)相比,我们的方法使用相同的GPU和CPU核心实现了比性能更高的1.97倍。在某些情况下,我们使用低性能GPU(例如GTX 1060,3 GB)的方法实现了使用高性能GPU(例如RTX 2080,8 GB)的Cublas的性能。此外,我们的方法不断提高性能,因为我们使用更多计算资源,如附加的CPU内核和GPU。我们可以实现如此高的性能,因为我们的方法充分利用了给定的异构平行计算系统的容量,同时采用了脱枝的方法,这具有较低的渐近复杂性。这些结果展示了我们算法的效率和稳健性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号