Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing

机译：具有自适应局部敏感哈希的Top-K实体解析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given a set of records, entity resolution algorithms find all the records referring to each entity. In top-k entity resolution, the goal is to find all the records referring to the k largest (in terms of number of records) entities. Top-k entity resolution is driven by many modern applications that operate over just the few most popular entities in a dataset. In this paper we introduce the problem of top-k entity resolution and we summarize a novel approach for this problem; full details are presented in a technical report. Our approach is based on locality-sensitive hashing, and can very rapidly and accurately process massive datasets. Our key insight is to adaptively decide how much processing each record requires to ascertain if it refers to a top-k entity or not: the less likely a record is to refer to a top-k entity, the less it is processed. The heavily reduced amount of processing for the vast majority of records that do not refer to top-k entities, leads to significant speedups. Our experiments with images, web articles, and scientific publications show a 2× to 25× speedup compared to traditional approaches for high-dimensional data.

机译：给定一组记录，实体解析算法会找到引用每个实体的所有记录。在前k个实体解析中，目标是找到所有与k个最大（就记录数而言）实体相关的记录。 Top-k实体解析是由许多现代应用程序驱动的，这些应用程序仅对数据集中少数几个最流行的实体进行操作。在本文中，我们介绍了top-k实体解析的问题，并总结了解决该问题的新方法。完整的详细信息将在技术报告中提供。我们的方法基于对位置敏感的哈希，并且可以非常快速，准确地处理大量数据集。我们的主要见解是自适应地确定每条记录需要多少处理才能确定它是否指向前k个实体：记录指向前k个实体的可能性越小，则处理的内容就越少。不涉及前k个实体的绝大多数记录的处理量大大减少，从而导致了显着的加速。我们的图像，网络文章和科学出版物的实验表明，与传统的高维数据处理方法相比，速度提高了2倍至25倍。

著录项

来源
《IEEE International Conference on Data Engineering》|2019年|1718-1721|共4页
会议地点
作者
Vasilis Verroios; Hector Garcia-Molina;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Erbium; Computer bugs; Image resolution; Cameras; Videos; Partitioning algorithms; Hash functions;

机译：b;计算机错误;图像分辨率;相机;视频;分区算法;哈希函数;

相似文献

外文文献
中文文献
专利

1. An adaptive mean shift clustering algorithm based on locality-sensitive hashing [J] . Zhang X., Cui Y., Li D., Optik: Zeitschrift fur Licht- und Elektronenoptik: = Journal for Light-and Electronoptic . 2012,第20期

机译：基于局部敏感哈希的自适应均值漂移聚类算法
2. ProgressER: Adaptive Progressive Approach to Relational Entity Resolution [J] . Altowim Yasser, Kalashnikov Dmitri V, Mehrotra Sharad ACM transactions on knowledge discovery from data . 2018,第3期

机译：ProgressER：关系实体解析的自适应渐进方法
3. Stream-based live entity resolution approach with adaptive duplicate count strategy [J] . Ma Kun, Yang Bo International journal of web and grid services . 2017,第3期

机译：具有自适应重复计数策略的基于流的活动实体解析方法
4. Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing [C] . Vasilis Verroios, Hector Garcia-Molina IEEE International Conference on Data Engineering . 2019

机译：具有自适应位置敏感散列的Top-K实体分辨率
5. High quality entity resolution with adaptive similarity functions [D] . Turan, Rabia 2011

机译：具有自适应相似功能的高质量实体分辨率
6. CONSULT: accurate contamination removal using locality-sensitive hashing [O] . Eleonora Rachtman, Vineet Bafna, Siavash Mirarab 2021

机译：咨询：使用当地敏感散列准确删除污染
7. Adaptive Connection Strength Models for Relationship-based Entity Resolution † A [O] . Dmitri V. Kalashnikov, Sharad Mehrotra 2013

机译：基于关系的实体解析的自适应连接强度模型†a

Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing

摘要

著录项

相似文献

相关主题

期刊订阅