首页> 外文会议>IEEE International Conference on Data Engineering >Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing
【24h】

Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing

机译:具有自适应局部敏感哈希的Top-K实体解析

获取原文

摘要

Given a set of records, entity resolution algorithms find all the records referring to each entity. In top-k entity resolution, the goal is to find all the records referring to the k largest (in terms of number of records) entities. Top-k entity resolution is driven by many modern applications that operate over just the few most popular entities in a dataset. In this paper we introduce the problem of top-k entity resolution and we summarize a novel approach for this problem; full details are presented in a technical report. Our approach is based on locality-sensitive hashing, and can very rapidly and accurately process massive datasets. Our key insight is to adaptively decide how much processing each record requires to ascertain if it refers to a top-k entity or not: the less likely a record is to refer to a top-k entity, the less it is processed. The heavily reduced amount of processing for the vast majority of records that do not refer to top-k entities, leads to significant speedups. Our experiments with images, web articles, and scientific publications show a 2× to 25× speedup compared to traditional approaches for high-dimensional data.
机译:给定一组记录,实体解析算法会找到引用每个实体的所有记录。在前k个实体解析中,目标是找到所有与k个最大(就记录数而言)实体相关的记录。 Top-k实体解析是由许多现代应用程序驱动的,这些应用程序仅对数据集中少数几个最流行的实体进行操作。在本文中,我们介绍了top-k实体解析的问题,并总结了解决该问题的新方法。完整的详细信息将在技术报告中提供。我们的方法基于对位置敏感的哈希,并且可以非常快速,准确地处理大量数据集。我们的主要见解是自适应地确定每条记录需要多少处理才能确定它是否指向前k个实体:记录指向前k个实体的可能性越小,则处理的内容就越少。不涉及前k个实体的绝大多数记录的处理量大大减少,从而导致了显着的加速。我们的图像,网络文章和科学出版物的实验表明,与传统的高维数据处理方法相比,速度提高了2倍至25倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号