...
首页> 外文期刊>International Journal of Networking and Computing >A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads
【24h】

A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

机译:细粒度,不规则工作负载的动态并行性能评估

获取原文

摘要

GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called Dynamic Parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU. This work investigates methods for employing Dynamic Parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the N-Queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of Dynamic Parallelism.
机译:对于通用计算,GPU计算设备已变得非常流行。但是,图形处理器的类似于SIMD的硬件当前不适用于不规则的工作负载,例如搜索不平衡的树。为了减轻这种缺陷,NVIDIA引入了对GPU编程模型的扩展,称为动态并行。此扩展使GPU程序可以直接在GPU上产生新的工作单元,从而可以基于中间结果优化后续工作项,而无需主CPU的参与。这项工作研究了采用动态并行的方法,目的是改进现代GPU硬件上树搜索算法的工作负载分配。为了评估所提出的方法,对N皇后问题进行了案例研究。广泛的基准测试表明,由于所研究问题的粒度非常精细,因此提高资源利用率的好处无法超过高管理开销和运行时限制。但是,提出了用于将参数传递给子网格的新颖的内存管理概念。这些一般概念适用于其他受益于使用动态并行的更粗粒度的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号