首页> 外文会议>Italian Symposium on Advanced Database Systems >Parallelizing Approximate Search on Adaptive Radix Trees
【24h】

Parallelizing Approximate Search on Adaptive Radix Trees

机译:对自适应基数树的平行化近似搜索

获取原文

摘要

Efficient searching in a number of strings is a common task in computer science and radix-trees are often used as a compact storage for string saving. One variant is the Adaptive-Radix-Tree (ART). ART has adaptive node sizes for more compact and cache-friendly memory layout. Graphical Processing Units (GPUs) can be used as hardware accelerator for massively parallel tasks and in addition they use very fast memory. We propose a parallel approximate search in the ART on CPU and GPU to optimize the throughput of queries and speed up applications that depends on these algorithms. Thereby we use the edit distance to compare two search keys in the tree and select appropriate values. We use the CPU for experimental comparison with the GPU, which have several thousand cores and modern processors typically have four to several dozens cores, but theses cores and RAM are more flexible. We propose several variations of the CPU algorithm like fixed vs. dynamic memory layouts and pointer vs. pointer-less data structures. In our experimental evaluation with OpenCL on ROCm 3.0, AMDs platform for GPU-Enabled HPC and Ultrascale Computing, the speedup and throughput of the GPU implementation for the approximate search in comparison with the best CPU variant are in the maximum up to factor 4.16 depending on the size of the tree and batch size. The speedup between the best and the worst CPU algorithm is up to factor 11.67, depending on tree and batch size.
机译:在许多字符串中的高效搜索是计算机科学中的常见任务,并且基地树通常用作串节省的紧凑存储器。一个变体是自适应 - 基地树(ART)。 ART具有自适应节点大小,可用于更紧凑和缓存友好的内存布局。图形处理单元(GPU)可用作硬件加速器,用于大规模并行任务,并因此使用非常快速的内存。我们提出了在CPU和GPU上的技术方面的平行近似搜索,以优化查询的吞吐量和加速取决于这些算法的应用程序。因此,我们使用编辑距离来比较树中的两个搜索密钥,并选择适当的值。我们使用CPU与GPU进行实验比较,其中有几千个核心和现代处理器通常具有四个到几十个核心,而是核心和RAM更加灵活。我们提出了像固定与动态内存布局和指针与指针数据结构一样的CPU算法的多个变体。在我们对Roccl 3.0上的OpenCL的实验评估中,AMDS支持GPU的HPC和UltraScale Computing的平台,与最佳CPU变体相比,GPU实现的GPU实现的加速和吞吐量在最大程度上至4.16,具体取决于因素4.16树和批次大小的大小。最佳和最差CPU算法之间的加速度最多可为12.67,具体取决于树和批次大小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号