首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Sparse Randomized Partition Trees for Nearest Neighbor Search
【24h】

Sparse Randomized Partition Trees for Nearest Neighbor Search

机译:用于最近邻居搜索的稀疏随机分区树

获取原文
           

摘要

Randomized partition trees have recently been shown to be very effective in solving nearest neighbor search problem. In spite of enjoying strong theoretical guarantee, it suffers from high space complexity, since each internal node of the tree needs to store a $d$ dimensional projection direction leading to a $O(nd)$ space complexity for a dataset of size $n$. Inspired by the fast Johnson-Lindenstrauss transform, in this paper, we propose a sparse version of randomized partition tree where each internal node needs to store only a few non-zero entries, as opposed to all $d$ entries, leading to significant space savings without sacrificing much in terms of nearest neighbor search accuracy. As a by product of this, query time of our proposed method is slightly better than that of its non-sparse counterpart for large dataset size. Our theoretical results indicate that our proposed method enjoys the same theoretical guarantee as that of the original non-sparse RP-tree. Experimental evaluations on four real world dataset strongly suggest that nearest neighbor search performance of our proposed sparse RP-tree is very similar to that of its non-sparse counterpart in terms of accuracy and number of retrieved points.
机译:最近已经证明,随机分区树在解决最近邻居搜索问题方面非常有效。尽管享有强大的理论保证,但它仍然具有较高的空间复杂度,因为树的每个内部节点都需要存储$ d $维投影方向,从而导致大小为$ n的数据集具有$ O(nd)$空间复杂度$。受快速约翰逊-林登斯特劳斯(Johnson-Lindenstrauss)转换的启发,我们提出了一种稀疏版本的随机分区树,其中每个内部节点只需要存储几个非零条目,而不是所有$ d $条目,从而导致大量空间在不牺牲最近邻搜索精度的情况下节省了成本。因此,对于大数据集,我们提出的方法的查询时间比其非稀疏方法的查询时间略短。我们的理论结果表明,我们提出的方法与原始的非稀疏RP树具有相同的理论保证。对四个现实世界数据集的实验评估强烈表明,我们提出的稀疏RP树的最近邻居搜索性能在准确性和检索点数方面与非稀疏RP树非常相似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号