A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

Max Plauth; Frank Feinbube; Frank Schlegel; Andreas Polze

首页> 外文期刊>International Journal of Networking and Computing >A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

【24h】

A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

机译：细粒度，不规则工作负载的动态并行性能评估

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called Dynamic Parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU. This work investigates methods for employing Dynamic Parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the N-Queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of Dynamic Parallelism.

机译：对于通用计算，GPU计算设备已变得非常流行。但是，图形处理器的类似于SIMD的硬件当前不适用于不规则的工作负载，例如搜索不平衡的树。为了减轻这种缺陷，NVIDIA引入了对GPU编程模型的扩展，称为动态并行。此扩展使GPU程序可以直接在GPU上产生新的工作单元，从而可以基于中间结果优化后续工作项，而无需主CPU的参与。这项工作研究了采用动态并行的方法，目的是改进现代GPU硬件上树搜索算法的工作负载分配。为了评估所提出的方法，对N皇后问题进行了案例研究。广泛的基准测试表明，由于所研究问题的粒度非常精细，因此提高资源利用率的好处无法超过高管理开销和运行时限制。但是，提出了用于将参数传递给子网格的新颖的内存管理概念。这些一般概念适用于其他受益于使用动态并行的更粗粒度的问题。

著录项

来源
《International Journal of Networking and Computing 》 |2016年第2期| 共18页
作者
Max Plauth; Frank Feinbube; Frank Schlegel; Andreas Polze;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs [J] . Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin, International journal of grid and high performance computing . 2014 ,第1期

机译：动态调整线程并行度以加快GPGPU上不规则工作负载分布的动态编程
2. Exploiting Workload Parallelism for Performance and Power Optimization in Blue Gene [J] . Salapura V., Walkup R., Gara A. IEEE Micro . 2006 ,第期

机译：利用工作负载并行性优化Blue Gene的性能和功耗
3. Exploiting Workload Parallelism for Performance and Power Optimization in Blue Gene [J] . V. Salapura, R. Walkup, A. Gara IEEE Micro . 2006 ,第5期

机译：利用工作负载并行性优化Blue Gene的性能和功耗
4. Using Dynamic Parallelism for Fine-Grained, Irregular Workloads: A Case Study of the N-Queens Problem [C] . Max Plauth, Frank Feinbube, Frank Schlegel, International Symposium on Computing and Networking . 2015

机译：使用动态并行处理细粒度，不规则的工作量：N皇后问题的案例研究
5. High Performance Soft Processor Architectures for Applications with Irregular Data- and Instruction-Level Parallelism [D] . Aasaraai, Kaveh 2014

机译：具有不规则数据和指令级并行性的应用的高性能软处理器架构
6. A novel phantom technique for evaluating the performance of PET auto-segmentation methods in delineating heterogeneous and irregular lesions [O] . B Berthon, C Marshall, R Holmes, 2015

机译：一种新的幻像技术用于评估PET自动分割方法在描绘异质性和不规则病变中的性能
7. Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism [O] . Marcos Maronas, Kevin Sala, Sergi Mateo, 2019

机译：工作台任务：利用不规则和细粒度环路的有效方法
8. Some Language Issues in High Performance Computing: Translation from Fine-grained Parallelism to Coarse-grained Parallelism [R] . Goudy, S. 2006

机译：高性能计算中的一些语言问题：从细粒度并行到粗粒度并行的转换

A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

摘要

著录项

相似文献

相关主题

期刊订阅