An Implementation of Conflict-Free Offline Permutation on the GPU

机译：在GPU上实现无冲突离线排列的实现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of the shared memory access of GPUs. The bank conflicts should be avoided for maximizing the bandwidth of the shared memory access. Offline permutation of an array is a task to copy all elements in array $a$ into array $b$ along a permutation given in advance. The main goal of this paper is to implement a conflict-free permutation algorithm on the DMM in a GPU. We have also implemented straightforward permutation algorithms on the GPU. The experimental results for 1024 float numbers on NVIDIA GeForce GTX-680 show that a straightforward permutation algorithm takes 246ns and 877ns for random permutation and bit-reversal permutation, respectively. Quite surprisingly, our conflict-free permutation algorithm runs in 165ns both for random permutation and for bit-reversal permutation although it performs more memory access operations. It follows that our conflict-free permutation is 1.5 times faster for random permutation and 5.3 times faster for bit-reversal permutation.

机译：离散内存机器（DMM）是一种理论上的并行计算模型，它捕获了GPU共享内存访问的本质。为使共享内存访问的带宽最大化，应避免存储体冲突。数组的离线排列是一项任务，它按照预先给出的排列将数组$ a $中的所有元素复制到数组$ b $中。本文的主要目标是在GPU中的DMM上实现无冲突的置换算法。我们还在GPU上实现了简单易用的置换算法。在NVIDIA GeForce GTX-680上对1024个浮点数进行的实验结果表明，一种简单的置换算法分别需要246ns和877ns的随机置换和位反转置换。令人惊讶的是，尽管它执行更多的内存访问操作，但我们的无冲突置换算法在随机置换和位反转置换中均以165ns运行。因此，我们的无冲突置换对于随机置换而言快1.5倍，对于位反转置换快5.3倍。

著录项

来源
《2012 Third International Conference on Networking and Computing.》|2012年|p.226-232|共7页
会议地点 Naha(JP);Naha(JP)
作者
Kasagi Akihiko; Nakano Koji; Ito Yasuaki;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;计算机网络;
关键词
CUDA; GPU; bank conflict; data movement; memory machine models; shared memory;

机译：CUDA; GPU;库冲突;数据移动;内存机器模型;共享内存;;

相似文献

外文文献
中文文献
专利

1. A GPU Implementation of Conflict-Free Offline Permutation [J] . Akihiko KASAGI, Koji NAKANO, Yasuaki ITO 電子情報通信学会技術研究報告 . 2012,第237期

机译：无冲突离线排列的GPU实现
2. A GPU Implementation of Conflict-Free Offline Permutation [J] . Akihiko KASAGI, Koji NAKANO, Yasuaki ITO 電子情報通信学会技術研究報告. コンピュ-タシステム. Computer Systems . 2012,第237期

机译：无冲突离线排列的GPU实现
3. An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU implementation [J] . Akihiko KASAGI, Koji NAKANO, Yasuaki ITO 電子情報通信学会技術研究報告. コンピュ-タシステム. Computer Systems . 2013,第169期

机译：带有GPU实现的分层存储机器上的最佳离线排列算法
4. An Implementation of Conflict-Free Offline Permutation on the GPU [C] . Akihiko Kasagi, Koji Nakano, Yasuaki Ito ICNC 2012 . 2012

机译：在GPU上实施无冲突离线排列
5. Blocked Algorithms for Neural Networks: Design and Implementation on GPUs [D] . Tillet, Philippe. 2020

机译：神经网络的阻止算法：GPU上的设计与实现
6. Molecular Dynamics Simulations Using the Drude Polarizable Force Field on GPUs with OpenMM: Implementation Validation and Benchmarks [O] . Jing Huang, Justin A. Lemkul, Peter K. Eastman, -1

机译：在带有OpenMM的GPU上使用Drude可极化力场的分子动力学模拟：实现验证和基准
7. Figure 2: CPU/GPU workflow of GPU-based parallel implementation of permutation testing. [O] . -1

机译：图2：基于GPU的CPU / GPU工作流程的置换测试的平行实现。

An Implementation of Conflict-Free Offline Permutation on the GPU

摘要

著录项

相似文献

相关主题

期刊订阅