首页> 外文期刊>Neural computing & applications >Efficient locality-sensitive hashing over high-dimensional streaming data
【24h】

Efficient locality-sensitive hashing over high-dimensional streaming data

机译:对高维流数据进行高效的局部敏感哈希处理

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Approximate nearest neighbor (ANN) search in high-dimensional spaces is fundamental in many applications. Locality-sensitive hashing (LSH) is a well-known methodology to solve the ANN problem. Existing LSH-based ANN solutions typically employ a large number of individual indexes optimized for searching efficiency. Updating such indexes might be impractical when processing high-dimensional streaming data. In this paper, we present a novel disk-based LSH index that offers efficient support for both searches and updates. The contributions of our work are threefold. First, we use the write-friendly LSM-trees to store the LSH projections to facilitate efficient updates. Second, we develop a novel estimation scheme to estimate the number of required LSH functions, with which the disk storage and access costs are effectively reduced. Third, we exploit both the collision number and the projection distance to improve the efficiency of candidate selection, improving the search performance with theoretical guarantees on the result quality. Experiments on four real-world datasets show that our proposal outperforms the state-of-the-art schemes.
机译:高维空间中的近似最近邻 (ANN) 搜索是许多应用的基础。局部敏感哈希 (LSH) 是解决 ANN 问题的一种众所周知的方法。现有的基于 LSH 的 ANN 解决方案通常采用大量针对搜索效率进行优化的单个索引。在处理高维流数据时,更新此类索引可能不切实际。在本文中,我们提出了一种新型的基于磁盘的LSH索引,该索引为搜索和更新提供了有效的支持。我们的工作有三方面的贡献。首先,我们使用可写的 LSM 树来存储 LSH 投影,以促进高效更新。其次,我们开发了一种新的估计方案来估计所需的LSH功能的数量,从而有效地降低了磁盘存储和访问成本。第三,利用碰撞数和投影距离来提高候选选择效率,提高搜索性能,对结果质量有理论保证。在四个真实世界数据集上的实验表明,我们的提案优于最先进的方案。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号