首页> 外文会议>2012 Third International Conference on Networking and Computing. >An Implementation of Conflict-Free Offline Permutation on the GPU
【24h】

An Implementation of Conflict-Free Offline Permutation on the GPU

机译:在GPU上实现无冲突离线排列的实现

获取原文
获取原文并翻译 | 示例

摘要

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of the shared memory access of GPUs. The bank conflicts should be avoided for maximizing the bandwidth of the shared memory access. Offline permutation of an array is a task to copy all elements in array $a$ into array $b$ along a permutation given in advance. The main goal of this paper is to implement a conflict-free permutation algorithm on the DMM in a GPU. We have also implemented straightforward permutation algorithms on the GPU. The experimental results for 1024 float numbers on NVIDIA GeForce GTX-680 show that a straightforward permutation algorithm takes 246ns and 877ns for random permutation and bit-reversal permutation, respectively. Quite surprisingly, our conflict-free permutation algorithm runs in 165ns both for random permutation and for bit-reversal permutation although it performs more memory access operations. It follows that our conflict-free permutation is 1.5 times faster for random permutation and 5.3 times faster for bit-reversal permutation.
机译:离散内存机器(DMM)是一种理论上的并行计算模型,它捕获了GPU共享内存访问的本质。为使共享内存访问的带宽最大化,应避免存储体冲突。数组的离线排列是一项任务,它按照预先给出的排列将数组$ a $中的所有元素复制到数组$ b $中。本文的主要目标是在GPU中的DMM上实现无冲突的置换算法。我们还在GPU上实现了简单易用的置换算法。在NVIDIA GeForce GTX-680上对1024个浮点数进行的实验结果表明,一种简单的置换算法分别需要246ns和877ns的随机置换和位反转置换。令人惊讶的是,尽管它执行更多的内存访问操作,但我们的无冲突置换算法在随机置换和位反转置换中均以165ns运行。因此,我们的无冲突置换对于随机置换而言快1.5倍,对于位反转置换快5.3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号