首页> 外文会议>International Conference on Similarity Search and Applications >Confirmation Sampling for Exact Nearest Neighbor Search
【24h】

Confirmation Sampling for Exact Nearest Neighbor Search

机译:确认精确最近邻搜索的采样

获取原文

摘要

Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC '98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings. We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability 1 - δ, using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of Ω(L) trees with internal parameters specifically tuned to the query and data.
机译:Indyk和Motwani在STOC'98引入的位置敏感散列(LSH)一直是高维数据集中最近邻居搜索的极其有影响力的框架。 While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear查询时间通常可以在实践中,即使对于精确最近的邻居搜索,直观地是因为最近的邻居往往比其他数据点明显更接近。但是,理论提供了有关如何在预指定的最坏情况下选择LSH参数的建议。我们介绍了使用LSH解决精确最近邻难的确认采样技术。首先,我们给出了一般的减少,它将每个数据结构序列转换为具有小,未知概率的最近邻居的数据结构中,以返回具有概率1 - Δ的最近邻居的数据结构,使用尽可能少的查询。其次,我们为LSH林数据结构提供了一种新的查询算法,LSH林数据结构具有L树的LSH林数据结构,该算法能够在与专门调整内部参数的Ω(l)树的LSH森林相同的时间内返回查询点的精确最近邻居。到查询和数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号