...
首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Exploiting Massive Parallelism for IndexingMulti-Dimensional Datasets on the GPU
【24h】

Exploiting Massive Parallelism for IndexingMulti-Dimensional Datasets on the GPU

机译:利用大规模并行性在GPU上为多维数据集建立索引

获取原文
获取原文并翻译 | 示例
           

摘要

Inherently multi-dimensional n-ary indexing structures such as R-trees are not well suited for the GPU because of their irregular memory access patterns and recursive back-tracking function calls. It has been known that traversing hierarchical tree structures in an irregular manner makes it difficult to exploit parallelism and to maximize the utilization of GPU processing units. Moreover, the recursive tree search algorithms often fail with large indexes because of the GPU’s tiny runtime stack size. In this paper, we propose a novel parallel tree traversal algorithm— for multi-dimensional range queries that avoids recursion and irregular memory access. The proposed MPRS algorithm traverses hierarchical tree structures with mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm. We implemented the proposed MPRS range query processing algorithm on n-ary bounding volume hierarchies including R-trees and evaluated its performance using real scientific datasets on an NVIDIA Tesla M2090 GPU. Our experiments show braided parallel SIMD friendly MPRS range query algorithm achieves at least 80 percent warp execution efficiency while task parallel tree traversal algorithm shows only 9-15 percent efficiency. Moreover, braided parallel MPRS algorithm accesses 7-20 times less amount of global memory than task parallel parent link algorithm by virtue of minimal warp divergence.
机译:固有的多维n元索引结构(例如R树)由于其不规则的内存访问模式和递归的回溯函数调用而不适用于GPU。已知以不规则的方式遍历分层树结构使得难以利用并行性和最大化GPU处理单元的利用。此外,由于GPU的运行时堆栈很小,因此递归树搜索算法通常因索引较大而失败。在本文中,我们提出了一种新颖的并行树遍历算法-用于多维范围查询,可避免递归和不规则的内存访问。所提出的MPRS算法遍历具有大多数连续存储器访问模式的层次树结构,而无需递归,这为优化并行SIMD算法提供了更多机会。我们在包括R树的n元边界体积层次结构上实现了建议的MPRS范围查询处理算法,并在NVIDIA Tesla M2090 GPU上使用真实的科学数据集评估了其性能。我们的实验表明,编织并行SIMD友好的MPRS范围查询算法可实现至少80%的翘曲执行效率,而任务并行树遍历算法仅显示9-15%的效率。此外,编织并行MPRS算法通过最小的翘曲散度,比任务并行父链接算法访问的全局内存少7-20倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号