Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs

机译：用于GPU上精确集合相似性的高效基于过滤器的算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Set similarity join is a core operation for text data integration, cleaning, and mining. Most state-of-the-art solutions rely on inherently sequential, CPU-based algorithms. In this paper, we propose a parallel algorithm for the set similarity joins harnessing the power of GPU systems through filtering techniques and divide-and-conquer strategies that scale well with data size. Furthermore, we also present parallel algorithms for all data pre-processing phases. As a result, we have an end-to-end solution to the set similarity join problem, which receives input text data and outputs pairs of similar strings and is entirely executed on the GPU. Our experimental results on standard datasets show substantial speedups over the fastest algorithms in the literature.

机译：集相似性联接是文本数据集成，清理和挖掘的核心操作。大多数最新解决方案都依赖于基于CPU的固有顺序算法。在本文中，我们提出了一种针对集合相似性的并行算法，该算法通过过滤技术和分而治之策略来利用GPU系统的强大功能，这些策略可以很好地随数据大小扩展。此外，我们还为所有数据预处理阶段提供了并行算法。结果，我们对集合相似性连接问题有了端到端的解决方案，该解决方案接收输入文本数据并输出成对的相似字符串，并且完全在GPU上执行。我们在标准数据集上的实验结果表明，与文献中最快的算法相比，其速度有了显着提高。

著录项

来源
《International conference on enterprise information systems》|2017年|74-95|共22页
会议地点
作者
Rafael David Quirino; Sidney Ribeiro-Junior; Leonardo Andrade Ribeiro; Wellington Santos Martins;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Advanced query processing High performance computing; Parallel set similarity join; GPU;

机译：高级查询处理高性能计算;并行集相似性联接;显卡;

相似文献

外文文献
中文文献
专利

1. An empirical evaluation of exact set similarity join techniques using GPUs [J] . Bellas Christos, Gounaris Anastasios Information Systems . 2020,第Mara期

机译：使用GPU的精确集合相似性连接技术的经验评估
2. HySet: A hybrid framework for exact set similarity join using a GPU [J] . Bellas Christos, Gounaris Anastasios Parallel Computing . 2021,第Jula期

机译：Hyset：使用GPU进行精确设置相似性的混合框架
3. Leveraging set relations in exact and dynamic set similarity join [J] . Wang Xubo, Qin Lu, Lin Xuemin, The VLDB journal . 2019,第2期

机译：在精确和动态集合相似性连接中利用集合关系
4. Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs [C] . Rafael David Quirino, Sidney Ribeiro-Junior, Leonardo Andrade Ribeiro, International conference on enterprise information systems . 2018

机译：基于高效的滤波器的算法，用于精确设置相似性连接GPU
5. Efficient Algorithms for Frequent Path Finding and Similarity Join in Big Multidimensional Data [D] . Luo, Wuman 2012

机译：大多维数据中频繁路径查找和相似联接的高效算法
6. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs [O] . Jacek Blazewicz, Wojciech Frohmberg, Michal Kierzynka, 2011

机译：具有多个GPU上高效回溯例程的蛋白质比对算法
7. GPU-based efficient join algorithms on Hadoop [O] . Hongzhi Wang, Ning Li, Zheng Wang, 2020

机译：基于GPU的高效加入算法在Hadoop上

Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅