Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs

机译：在GPU上进行细粒度平行任务的优先级队列的性能评估

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Graphics processing units (GPUs) are increasingly applied to accelerate tasks such as graph problems and discreteevent simulation that are characterized by irregularity, i.e., a strong dependence of the control flow and memory accesses on the input. The core data structure in many of these irregular tasks are priority queues that guide the progress of the computations and which can easily become the bottleneck of an application. To our knowledge, currently no systematic comparison of priority queue implementations on GPUs exists in the literature. We close this gap by a performance evaluation of GPU-based priority queue implementations for two applications: discrete-event simulation and parallel A* path searches on grids. We focus on scenarios requiring large numbers of priority queues holding up to a few thousand items each. We present performance measurements covering linear queue designs, implicit binary heaps, splay trees, and a GPU-specific proposal from the literature. The measurement results show that up to about 500 items per queue, circular buffers frequently outperform tree-based queues for the considered applications, particularly under a simple parallelization of individual item enqueue operations. We analyze profiling metrics to explore classical queue designs in light of the importance of high hardware utilization as well as homogeneous computations and memory accesses across GPU threads.

机译：图形处理单元（GPU）越来越多地应用于加速任务，例如图形问题，并且具有不规则性的特征的分离仿真，即控制流程和存储器对输入上的基本依赖性。这些不规则任务中许多的核心数据结构是指导计算进度的优先级队列，并且可以容易地成为应用程序的瓶颈。据我们所知，目前在文献中，目前没有对GPU上的优先级队列实施的系统比较。我们通过对两个应用程序的基于GPU的优先级队列实现的性能评估来关闭此差距：在网格上进行离散事件仿真和并行A *路径搜索。我们专注于需要大量优先队队列的情况，每个优先队列持有最多几千件物品。我们呈现涵盖线性队列设计，隐式二进制堆，SPLAY树和文献的GPU特定提案的性能测量。测量结果表明，每队列最多约500个项目，循环缓冲区频繁优于所考虑的应用程序的基于树的队列，特别是在单个项目enqueue操作的简单并行化下。我们分析了分析指标，鉴于高硬件利用率的重要性以及GPU线程的同类计算和内存访问，探讨古典队列设计。

著录项

来源
《IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems》|2017年|264p|共11页
会议地点
作者
Nikolai Baudis; Florian Jacob; Philipp Andelfinger;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Graphics processing units; Computational modeling; Instruction sets; Data structures; Synchronization; Analytical models;

机译：图形处理单元;计算建模;指令集;数据结构;同步;分析模型;

相似文献

外文文献
中文文献
专利

1. Performance evaluation of GPU- and cluster-computing for parallelization of compute-intensive tasks [J] . Alexander Doeschl, Max-Emanuel Keller, Peter Mandl International journal of web information systems . 2021,第4期

机译：GPU和聚类计算对计算密集型任务并行化的绩效评估
2. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs [J] . Knap Marcin, Czarnul Pawel Journal of supercomputing . 2019,第11期

机译：对NVIDIA Pascal和Volta GPU上的选定并行CUDA应用程序进行预取和超额预订的统一内存的性能评估
3. Performance evaluation of GPU parallelization, space-time adaptive algorithms, and their combination for simulating cardiac electrophysiology [J] . Oliveira Rafael Sachetto, Rocha Bernardo Martins, Burgarelli Denise, Communications in Numerical Methods in Engineering . 2018,第2期

机译：GPU并行化，时空自适应算法及其组合在模拟心脏电生理方面的性能评估
4. Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs [C] . Nikolai Baudis, Florian Jacob, Philipp Andelfinger 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems . 2017

机译：GPU上的细粒度并行任务优先级队列的性能评估
5. Performance Evaluation of Blocking and Non-Blocking Concurrent Queues on GPUs [D] . Pourmeidani, Hossein 2018

机译：GPU上阻塞和非阻塞并发队列的性能评估
6. Based on Regular Expression Matching of Evaluation of the Task Performance in WSN: A Queue Theory Approach [O] . Jie Wang, Kai Cui, Kuanjiu Zhou, -1

机译：基于正则表达式匹配的WSN任务绩效评估：队列理论方法
7. High-Performance Priority Queues for Parallel Crawlers [O] . Mauricio Marin, Rodrigo Paredes, Carolina Bonacic 2009

机译：并行爬虫的高性能优先级队列
8. Queueing Network Systems with Unbalanced Flows and Their Applications to Performance Evaluation of Highly Parallel Distributed Information Systems. Revision [R] . Wang, Y. R., Madnick, S. E. 1984

机译：不平衡流排队网络系统及其在高度并行分布式信息系统性能评估中的应用。调整

Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅