Asynchronous Memory Machine Models with Barrier Synchronization

机译：具有屏障同步的异步内存机器模型

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. It was assumed that warps (i.e. groups of threads) on the DMM and the UMM work synchronously in the round-robin manner. However, warps work asynchronously in the actual GPUs, in the sense that warps may be randomly (or arbitrarily) dispatched for execution. The first contribution of this paper is to introduce an asynchronous version of the DMM and the UMM, in which warps are arbitrarily dispatched. Instead, we assume that threads can execute the ``sync threads'' instruction for barrier synchronization. Since the barrier synchronization operation is costly, we should evaluate and minimize the number of barrier synchronization operations performed by parallel algorithms. The second contribution of this paper is to show a parallel algorithm to compute the sum of boldmath$n$ numbers in optimal computing time and few barrier synchronization steps. Our parallel algorithm computes the sum of boldmath$n$ numbers in boldmath$O({nover w}+llog n)$ time units and boldmath$O(log{lover w}+loglog w)$ barrier synchronization steps using boldmath$wl$ threads both on the asynchronous DMM and on the asynchronous UMM with width boldmath$w$ and latency boldmath$l$. We also prove that the computing time is optimal because it matches the theoretical lower bound. Quite surprisingly, the number of barrier synchronization steps and the number of threads are independent of boldmath$n$. Even if the input size boldmath$n$ is quite large, our parallel algorithm computes the sum in optimal time units and a fixed number of sync threads using a fixed number of threads.

机译：离散内存机（DMM）和统一内存机（UMM）是理论上的并行计算模型，可捕获GPU共享内存和全局内存的本质。假定DMM和UMM上的扭曲（即线程组）以循环方式同步工作。但是，从实际意义上讲，扭曲可能是随机（或任意）调度执行的，因此在实际的GPU中异步运行。本文的第一篇贡献是介绍了DMM和UMM的异步版本，其中任意分配了扭曲。相反，我们假设线程可以执行``同步线程''指令以进行屏障同步。由于屏障同步操作的成本很高，因此我们应该评估并最小化并行算法执行的屏障同步操作的数量。本文的第二个贡献是展示了一种并行算法，可以在最佳计算时间和较少的障碍同步步骤中计算出boldmath $ n $数的总和。我们的并行算法使用boldmath $ wl计算以boldmath $ O（{nover w} + llog n）$时间单位和boldmath $ O（log {lover w} + loglog w）$障碍同步步骤生成的boldmath $ n $数字和$在异步DMM和异步UMM上都具有宽度为boldmath $ w $和等待时间为boldmath $ l $的线程。我们还证明了计算时间是最佳的，因为它与理论下限匹配。令人惊讶的是，屏障同步步骤的数量和线程的数量与boldmath $ n $无关。即使输入大小boldmath $ n $很大，我们的并行算法也会以最佳时间单位和固定数量的同步线程来计算总和。

著录项

来源
《2012 Third International Conference on Networking and Computing.》|2012年|p.58-67|共10页
会议地点 Naha(JP);Naha(JP)
作者
Nakano Koji;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;计算机网络;
关键词
CUDA; GPU; asynchronous models; contiguous memory access; parallel algorithms; parallel computing models;

机译：CUDA; GPU;异步模型;连续内存访问;并行算法;并行计算模型;;

相似文献

外文文献
中文文献
专利

1. Asynchronous Memory Machine Models with Barrier Synchronization [J] . Koji NAKANO IEICE transactions on information and systems . 2014,第3期

机译：具有屏障同步的异步内存机器模型
2. Asynchronous Memory Machine Models with Barrier Synchronization [J] . Koji NAKANO IEICE transactions on information and systems . 2014,第3期

机译：具有屏障同步的异步内存机器模型
3. Asynchnorous Memory Machine Modelswith Barrier Synchronization [J] . Koji NAKANO 電子情報通信学会技術研究報告 . 2013,第481期

机译：具有屏障同步的异步存储机模型
4. Asynchronous Memory Machine Models with Barrier Synchronization [C] . Nakano Koji International Conference on Networking and Computing . 2012

机译：具有屏障同步的异步内存机器型号
5. Abstract Graph Machine: Modeling Orderings in Asynchronous Distributed-Memory Parallel Graph Algorithms [D] . Kanewala, Thejaka Amila. 2018

机译：抽象图机：异步分布式内存并行图算法中的建模顺序
6. Breaking the millisecond barrier on SpiNNaker: implementing asynchronous event-based plastic models with microsecond resolution [O] . Xavier Lagorce, Evangelos Stromatias, Francesco Galluppi, 2015

机译：突破SpiNNaker的毫秒级障碍：以微秒分辨率实现基于事件的异步塑料模型
7. Asynchronous Memory Machine Models with Barrier Synchronization [O] . Koji NAKANO 2014

机译：具有屏障同步的异步内存机器型号

Asynchronous Memory Machine Models with Barrier Synchronization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅