【24h】

Efficient NAS Parallel Benchmark Kernels with CUDA

机译:使用CUDA的高效NAS并行基准内核

获取原文

摘要

NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate parallel hardware and software. There are many research efforts trying to provide different parallel versions apart from the original OpenMP and MPI. Concerning GPU accelerators, there are only the OpenCL and OpenACC available as consolidated versions. Our goal is to provide an efficient parallel implementation of the five NPB kernels with CUDA. Our contribution covers different aspects. First, best parallel programming practices were followed to implement NPB kernels using CUDA. Second, the support of larger workloads (class B and C) allow to stress and investigate the memory of robust GPUs. Third, we show that it is possible to make NPB efficient and suitable for GPUs although the benchmarks were designed for CPUs in the past. We succeed in achieving double performance with respect to the state-of-the-art in some cases as well as implementing efficient memory usage. Fourth, we discuss new experiments comparing performance and memory usage against OpenACC and OpenCL state-of-the-art versions using a relative new GPU architecture. The experimental results also revealed that our version is the best one for all the NPB kernels compared to OpenACC and OpenCL. The greatest differences were observed for the FT and EP kernels.
机译:NAS并行基准(NPB)是用于评估并行硬件和软件的标准基准套件之一。除了原始的OpenMP和MPI,还有许多研究工作试图提供不同的并行版本。关于GPU加速器,只有OpenCL和OpenACC作为合并版本可用。我们的目标是使用CUDA提供五个NPB内核的高效并行实现。我们的贡献涵盖不同方面。首先,遵循最佳并行编程实践以使用CUDA实现NPB内核。其次,对更大的工作负载(B和C类)的支持可以强调并研究强大的GPU的内存。第三,尽管基准测试过去是针对CPU设计的,但我们证明可以使NPB高效且适合GPU。在某些情况下,我们成功地实现了有关最新技术的双重性能,并实现了有效的内存使用。第四,我们讨论了使用相对新的GPU架构比较OpenACC和OpenCL最新版本的性能和内存使用情况的新实验。实验结果还表明,与OpenACC和OpenCL相比,我们的版本是所有NPB内核中最好的版本。对于FT和EP内核观察到最大的差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号