首页> 外文会议>2011 IEEE International Conference on Multimedia and Expo >Grassmann Hashing for approximate nearest neighbor search in high dimensional space
【24h】

Grassmann Hashing for approximate nearest neighbor search in high dimensional space

机译:Grassmann Hashing在高维空间中进行近似最近邻搜索

获取原文
获取外文期刊封面目录资料

摘要

Locality-Sensitive Hashing (LSH) approximates nearest neighbors in high dimensions by projecting original data into low-dimensional subspaces. The basic idea is to hash data samples to ensure that the probability of collision is much higher for samples that are close to each other than for those that are far apart. However, by applying k random hashing functions on original data, LSH fails to find the most discriminant hashing-subspaces, so the nearest neighbor approximation is inefficient. To alleviate this problem, we propose the Grassmann Hashing (GRASH) for approximate nearest neighbor search in high dimensions. GRASH first introduces a set of subspace candidates from Linear Discriminant Analysis (LDA). Then it applies Grassmann metric to select the optimal subspaces for hashing. Finally, it generates hashing codes based on non-uniform bucket size design motivated by Lloyd-Max quantization. The proposed GRASH model enjoys a number of merits: 1) GRASH introduces the Grassmann metric to measure the similarity between different hashing subspaces, so the hashing function can better capture the data diversity; 2) GRASH obtains the subspace candidates from LDA, so it incorporates the discriminant information into the hashing functions; 3) GRASH extends LSH's 1-d hashing subspaces to m-d, i.e. it is a multidimensional extension of hashing approximation; 4) motivated by Lloyd-Max quantization, GRASH applies non-uniform size bucket to generate hashing codes, so the distortion can be minimized. Experimental results on a number of datasets confirm the validity of our proposed model.
机译:局部敏感散列(LSH)通过将原始数据投影到低维子空间中来近似高维中的最近邻居。基本思想是对数据样本进行哈希处理,以确保相互靠近的样本比相距较远的样本发生碰撞的可能性高得多。但是,通过对原始数据应用k个随机散列函数,LSH无法找到最有区别的散列子空间,因此最近的邻居近似效率不高。为了缓解此问题,我们提出了Grassmann Hashing(GRASH),用于在高维中进行近似最近的邻居搜索。 GRASH首先从线性判别分析(LDA)引入了一组子空间候选对象。然后,它应用Grassmann度量来选择用于散列的最佳子空间。最后,它基于Lloyd-Max量化的非均匀存储桶大小设计生成哈希码。提出的GRASH模型具有许多优点:1)GRASH引入了Grassmann度量来度量不同哈希子空间之间的相似性,因此哈希函数可以更好地捕获数据多样性。 2)GRASH从LDA获取候选子空间,因此将判别信息合并到哈希函数中; 3)GRASH将LSH的1-d哈希子空间扩展到m-d,即它是哈希近似的多维扩展; 4)在Lloyd-Max量化的推动下,GRASH应用非均匀大小的存储桶生成哈希码,因此可以将失真最小化。在许多数据集上的实验结果证实了我们提出的模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号