首页> 外文期刊>ACM transactions on database systems >Efficient and Accurate Nearest Neighbor and Closest Pair Search in High-Dimensional Space
【24h】

Efficient and Accurate Nearest Neighbor and Closest Pair Search in High-Dimensional Space

机译:高维空间中高效,准确的最近邻和最近对搜索

获取原文
获取原文并翻译 | 示例

摘要

Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (ⅰ) it can be easily incorporated in a relational database, and (ⅱ) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results.rnMotivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space,supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (ⅰ) iDistance (a famous technique for exact NN search) by two orders of magnitude, and (ⅱ) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results.rnAs a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial reduction in the space and running time. In our experiments, our technique was faster: (ⅰ) than distance browsing (a well-known method for solving the problem exactly) by several orders of magnitude, and (ⅱ) than D-shift (an approximate approach with theoretical guarantees in low-dimensional space) by one order of magnitude, and at the same time, outputs better results.
机译:高维空间中的最近邻(NN)搜索是许多应用程序中的重要问题。从数据库角度来看,一个好的解决方案需要具有两个属性:(ⅰ)可以轻松地将其合并到关系数据库中;(ⅱ)无论数据和查询分布如何,其查询成本都应随数据集大小而线性增加。局部敏感哈希(LSH)是一种既满足这两个要求又广为人知的方法,但是其当前的实现要么会产生昂贵的空间和查询成本,要么就放弃了对查询结果质量的理论保证。一种称为“局部敏感B树”(LSB-tree)的访问方法,可以在关系数据库中进行快速,准确的高维NN搜索。几个LSB树的组合形成一个LSB​​林,它具有强大的质量保证,但是可以大大提高以前具有相同保证的LSH实现的效率。实际上,LSB树本身也是有效的索引,它消耗线性空间,支持有效的更新并提供准确的查询结果。在我们的实验中,LSB树的速度比:(ⅰ)iDistance(精确的NN搜索的著名技术)快两个数量级,以及(ⅱ)MedRank(一种具有非平凡质量保证的最新近似方法)快一个数量级。下一步,我们扩展了LSB技术,以解决另一个经典问题,即高维空间中的最接近对(CP)搜索。对于该问题的长期挑战是在非常高的尺寸下实现二次运行时间,这使大多数现有解决方案失败了。我们表明,使用LSB森林,可以在(最坏情况下)的时间(大大低于二次复杂度)内完成CP搜索,但仍然可以确保非常好的质量。实际上,仅使用两个LSB树就可以找到准确的答案,从而大大减少了空间和运行时间。在我们的实验中,我们的技术更快:(ⅰ)比距离浏览(一种精确解决问题的众所周知的方法)快了几个数量级,并且(ⅱ)比D-shift(一种具有较低理论保证的近似方法)快维空间)缩小一个数量级,同时输出更好的结果。

著录项

  • 来源
    《ACM transactions on database systems》 |2010年第3期|P.20.1-20.46|共46页
  • 作者单位

    Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, Hong Kong;

    rnDepartment of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay,Hong Kong;

    rnDepartment of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, Hong Kong;

    rnDivision of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    locality-sensitive hashing; nearest neighbor search; closest pair search;

    机译:局部敏感哈希最近邻居搜索;最近对搜索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号