首页> 外文期刊>Journal of supercomputing >Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
【24h】

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

机译:针对所选并行CUDA应用程序的统一内存和动态并行性的性能评估

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The aim of this paper is to evaluate performance of new CUDA mechanisms-unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically, tested applications include verification of Goldbach's conjecture, 2D heat transfer simulation and adaptive numerical integration. We experimented with various ways of how dynamic parallelism can be deployed into an existing implementation and be optimized further. Subsequently, we compared the best dynamic parallelism and unified memory versions to respective standard API counterparts. It was shown that usage of dynamic parallelism resulted in improvement in performance for heat simulation, better than static but worse than an iterative version for numerical integration and finally worse results for Golbach's conjecture verification. In most cases, unified memory results in decrease in performance. On the other hand, both mechanisms can contribute to simpler and more readable codes. For dynamic parallelism, it applies to algorithms in which it can be naturally applied. Unified memory generally makes it easier for a programmer to enter the CUDA programming paradigm as it resembles the traditional memory allocation/usage pattern.
机译:本文的目的是评估与标准CUDA API版本相比,用于实际并行应用程序的统一内存和动态并行性的新CUDA机制的性能。为了深入了解这些机制的性能,我们决定使用SPMD,几何SPMD和分治法的典型控制和数据流来实现三个应用程序,然后将其用于测试和实验。具体而言,经过测试的应用包括哥德巴赫猜想的验证,2D传热模拟和自适应数值积分。我们以各种方式进行了实验,以了解如何将动态并行性部署到现有实现中并进一步进行优化。随后,我们将最佳动态并行性和统一内存版本与各自的标准API对应版本进行了比较。结果表明,动态并行性的使用提高了热模拟的性能,优于静态,但优于用于数值积分的迭代版本,最后使Golbach的猜想验证结果更差。在大多数情况下,统一内存会导致性能下降。另一方面,两种机制都可以使代码更简单,可读性更好。对于动态并行,它适用于可以自然应用的算法。统一内存通常类似于传统的内存分配/使用模式,因此程序员可以更轻松地进入CUDA编程范例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号