首页> 外文学位 >Allocation-oriented Algorithm Design with Application to GPU Computing.

【24h】

Allocation-oriented Algorithm Design with Application to GPU Computing.

机译：面向分配的算法设计及其在GPU计算中的应用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The wide data-parallelism of GPU processor design facilitates the execution of many concurrent, fine-grained tasks with unprecedented performance and efficiency. However, contemporary opinion regards GPU architecture as being poorly suited for many parallelizations that impose dynamic dependences between processing elements. This dissertation specifically attends to problems whose efficient solutions require cooperative allocation, i.e., their dependences stem from dynamic data placement within shared structures. The contribution of this thesis is a set of design patterns and reusable, tunable software primitives that foster the construction of cooperative, allocation-oriented parallelizations that are significantly faster, more efficient, more flexible, and more performance-portable than the state of the art.;Whereas concurrent CPU programs typically leverage fine-grained atomic operations for coordinating data placement, we demonstrate the advantages of parallel prefix sum as a high performance, bulk-synchronous alternative for cooperative allocation. We construct very efficient algorithms for global and local prefix sum by employing flexible granularity coarsening techniques for properly balancing serial and parallel workloads. The resulting efficiency gains permit kernel fusion techniques whereby application-specific logic can be incorporated within our prefix sum constructs with little or no overhead.;To demonstrate the viability of our methods, we construct cooperative GPU implementations for a variety of parallel list-processing primitives including reduction, prefix scan, duplicate removal, histogram, and reduce-by-key. We evaluate their performance across a wide spectrum of problem sizes, types, and target architectures. To achieve performance-portability, we present a policy-based autotuning design idiom in which we leave implementation decisions having opaque performance consequences unbound within the program text. Finally, we showcase high performance implementations for two archetypal problems in computer science: sorting and breadth-first search (BFS). We achieve excellent performance for both genres, demonstrating multiple factors of speedup over prior parallelizations for GPU and conventional multi-core platforms.

机译：GPU处理器设计的广泛数据并行性以前所未有的性能和效率促进了许多并发，细粒度任务的执行。但是，现代观点认为GPU体系结构不适合许多并行化，这些并行化在处理元素之间施加了动态依赖性。本论文专门针对那些需要有效分配解决方案的问题，即它们的依赖源于共享结构中动态数据的放置。本文的贡献是提供了一组设计模式和可重用，可调的软件原语，这些原语可促进构建协作的，面向分配的并行化，与现有技术相比，该并行化显着更快，更高效，更灵活且性能更佳。尽管并发CPU程序通常利用细粒度的原子操作来协调数据放置，但我们展示了并行前缀总和作为协作分配的高性能，批量同步替代方案的优点。通过采用灵活的粒度粗化技术来适当平衡串行和并行工作负载，我们为全局和本地前缀总和构造了非常有效的算法。由此产生的效率提高允许采用核融合技术，从而可以将应用程序特定的逻辑合并到我们的前缀和构造中，而几乎没有开销或没有开销。为了证明我们方法的可行性，我们为各种并行列表处理原语构造了协作GPU实现包括缩小，前缀扫描，重复删除，直方图和按键缩小。我们评估它们在各种问题大小，类型和目标体系结构中的性能。为了实现性能的可移植性，我们提出了一种基于策略的自动调整设计习惯用法，其中我们将具有不透明性能后果的实现决策留在程序文本中。最后，我们展示了计算机科学中两个原型问题的高性能实现：排序和广度优先搜索（BFS）。我们在两种类型的游戏中均取得了出色的性能，这证明了在加速GPU和传统多核平台并行化方面的多个因素。

著录项

作者
Merrill, Duane G., III.;
展开▼
作者单位

University of Virginia.;

展开▼
授予单位 University of Virginia.;
学科 Engineering Computer.;Computer Science.
学位 Ph.D.
年度 2011
页码 188 p.
总页数 188
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. DVFS Algorithms of GPU and Memory for Mobile GPGPU Applications: A case study [J] . SeongKi Kim, Seok-Kyoo Kim International Journal of Engineering & Technology . 2018,第3期

机译：用于移动GPGPU应用的GPU和内存的DVFS算法：案例研究
2. Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression [J] . Boukaram Wajih Halim, Turkiyyah George, Ltaief Hatem, Parallel Computing . 2018,第MAY期

机译：GPU上的批量QR和SVD算法及其在层次矩阵压缩中的应用
3. A Novel Computational Model for GPUs with Applications to Efficient Algorithms [J] . Atsushi Koike, Kunihiko Sadakane International Journal of Networking and Computing . 2015,第1期

机译：GPU的新型计算模型及其在高效算法中的应用
4. A Novel Computational Model for GPUs with Application to I/O Optimal Sorting Algorithms [C] . Koike Atsushi, Sadakane Kunihiko IEEE International Parallel Distributed Processing Symposium . 2014

机译：GPU的新型计算模型及其在I / O最佳排序算法中的应用
5. Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms. [D] . Wu, Jing. 2014

机译：用于将算法和应用程序映射到CUDA GPU平台和CPU-GPU异构平台的优化技术。
6. GPU-DAEMON: GPU Algorithm Design Data Management Optimization template for array based big omics data [O] . Muaaz Gul Awan, Taban Eslami, Fahad Saeed -1

机译：GPU-DAEMON：用于基于阵列的大组学数据的GPU算法设计数据管理和优化模板
7. ユウリクカンスウト GPU ヘイレツショリニヨルインカンスウビョウガニツイテ COMPUTER ALGEBRA DESIGN OF ALGORITHMS IMPLEMENTATIONS AND APPLICATIONS [O] . 近藤祐史, 兵頭礼子, 村尾裕一, 2012

机译：Yuri Kukansuto GPU Hailetsu Shori Niyoru Inkansu Byoga Niitsuite算法代数的计算机代数设计及其应用

Allocation-oriented Algorithm Design with Application to GPU Computing.

摘要

著录项

相似文献

相关主题

期刊订阅