首页> 外文会议>International conference on enterprise information systems >Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs
【24h】

Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs

机译:用于GPU上精确集合相似性的高效基于过滤器的算法

获取原文
获取外文期刊封面目录资料

摘要

Set similarity join is a core operation for text data integration, cleaning, and mining. Most state-of-the-art solutions rely on inherently sequential, CPU-based algorithms. In this paper, we propose a parallel algorithm for the set similarity joins harnessing the power of GPU systems through filtering techniques and divide-and-conquer strategies that scale well with data size. Furthermore, we also present parallel algorithms for all data pre-processing phases. As a result, we have an end-to-end solution to the set similarity join problem, which receives input text data and outputs pairs of similar strings and is entirely executed on the GPU. Our experimental results on standard datasets show substantial speedups over the fastest algorithms in the literature.
机译:集相似性联接是文本数据集成,清理和挖掘的核心操作。大多数最新解决方案都依赖于基于CPU的固有顺序算法。在本文中,我们提出了一种针对集合相似性的并行算法,该算法通过过滤技术和分而治之策略来利用GPU系统的强大功能,这些策略可以很好地随数据大小扩展。此外,我们还为所有数据预处理阶段提供了并行算法。结果,我们对集合相似性连接问题有了端到端的解决方案,该解决方案接收输入文本数据并输出成对的相似字符串,并且完全在GPU上执行。我们在标准数据集上的实验结果表明,与文献中最快的算法相比,其速度有了显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号