Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

Jarzabek Lukasz; Czarnul Pawel

首页> 外文期刊>Journal of supercomputing >Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

【24h】

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

机译：针对所选并行CUDA应用程序的统一内存和动态并行性的性能评估

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The aim of this paper is to evaluate performance of new CUDA mechanisms-unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically, tested applications include verification of Goldbach's conjecture, 2D heat transfer simulation and adaptive numerical integration. We experimented with various ways of how dynamic parallelism can be deployed into an existing implementation and be optimized further. Subsequently, we compared the best dynamic parallelism and unified memory versions to respective standard API counterparts. It was shown that usage of dynamic parallelism resulted in improvement in performance for heat simulation, better than static but worse than an iterative version for numerical integration and finally worse results for Golbach's conjecture verification. In most cases, unified memory results in decrease in performance. On the other hand, both mechanisms can contribute to simpler and more readable codes. For dynamic parallelism, it applies to algorithms in which it can be naturally applied. Unified memory generally makes it easier for a programmer to enter the CUDA programming paradigm as it resembles the traditional memory allocation/usage pattern.

机译：本文的目的是评估与标准CUDA API版本相比，用于实际并行应用程序的统一内存和动态并行性的新CUDA机制的性能。为了深入了解这些机制的性能，我们决定使用SPMD，几何SPMD和分治法的典型控制和数据流来实现三个应用程序，然后将其用于测试和实验。具体而言，经过测试的应用包括哥德巴赫猜想的验证，2D传热模拟和自适应数值积分。我们以各种方式进行了实验，以了解如何将动态并行性部署到现有实现中并进一步进行优化。随后，我们将最佳动态并行性和统一内存版本与各自的标准API对应版本进行了比较。结果表明，动态并行性的使用提高了热模拟的性能，优于静态，但优于用于数值积分的迭代版本，最后使Golbach的猜想验证结果更差。在大多数情况下，统一内存会导致性能下降。另一方面，两种机制都可以使代码更简单，可读性更好。对于动态并行，它适用于可以自然应用的算法。统一内存通常类似于传统的内存分配/使用模式，因此程序员可以更轻松地进入CUDA编程范例。

著录项

来源
《Journal of supercomputing》 |2017年第12期|5378-5401|共24页
作者
Jarzabek Lukasz; Czarnul Pawel;
展开▼
作者单位

Gdansk Univ Technol, Fac Elect Telecommun & Informat, Gdansk, Poland;

Gdansk Univ Technol, Fac Elect Telecommun & Informat, Gdansk, Poland;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
CUDA; Dynamic parallelism; Unified memory; Parallel programming;

机译：CUDA;动态并行;统一内存;并行编程;

相似文献

外文文献
中文文献
专利

1. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs [J] . Knap Marcin, Czarnul Pawel Journal of supercomputing . 2019,第11期

机译：对NVIDIA Pascal和Volta GPU上的选定并行CUDA应用程序进行预取和超额预订的统一内存的性能评估
2. A parallelism-based analytic approach to performance evaluation using application programs [J] . Bradley D.K., Larson J.L. Proceedings of the IEEE . 1993,第8期

机译：使用应用程序的基于并行度的性能评估方法
3. A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads [J] . Max Plauth, Frank Feinbube, Frank Schlegel, International Journal of Networking and Computing . 2016,第2期

机译：细粒度，不规则工作负载的动态并行性能评估
4. Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU [C] . Shixiong Xu, Gregg David IEEE International Conference on Trust, Security and Privacy in Computing and Communications;IEEE International Conference on Big Data Science and Engineering;IEEE International Symposium on Parallel and Distributed Processing with Applications . 2015

机译：在矢量化中利用超循环并行性来提高CUDA GPGPU上的内存性能
5. Enabling Efficient Parallelism for Applications with Dependences and Irregular Memory Accesses [D] . Jiang, Peng. 2019

机译：为具有依赖性和不规则内存访问的应用程序启用有效的并行性
6. An Investigation of Unified Memory Access Performance in CUDA [O] . Raphael Landaverde, Tiansheng Zhang, Ayse K. Coskun, -1

机译：CUDA中统一内存访问性能的调查
7. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs [O] . Marcin Knap, Paweł Czarnul 2019

机译：NVIDIA Pascal和Volta GPU中所选并行CUDA应用的预取和超订阅统一内存的绩效评估

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅