首页> 中文期刊> 《电子学报》 >M2LSH:基于LSH的高维数据近似最近邻查找算法

M2LSH:基于LSH的高维数据近似最近邻查找算法

         

摘要

在许多应用中,LSH(Locality Sensitive Hashing)以及各种变体,是解决近似最近邻问题的有效算法之一.虽然这些算法能够很好地处理分布比较均匀的高维数据,但从设计方案来看,都没有针对数据分布不均匀的情况做相应的优化.针对这一问题,本文提出了一种新的基于LSH的解决方案(M2LSH,2 Layers Merging LSH),对于数据分布不均匀的情况依然能得到一个比较好的查询效果.首先,将数据存放到具有计数功能的组合哈希向量表示的哈希桶中,然后通过二次哈希将这些桶号投影到一维空间,在此空间根据各个桶中存放的数据个数合并相邻哈希桶,使得新哈希桶中的数据量能够大致均衡.查询时仅访问有限个哈希桶,就能找到较优结果.本文给出了详细的理论分析,并通过实验验证了M2LSH的性能,不仅能减少访问时间,也可提高结果的正确率.%The LSH (Locality Sensitive Hashing) method and its variants represent the state-of-the-art indexing techniques to process ANN (Approximate Nearest Neighbor) searches in a high dimensional data space.Although the existing LSH based techniques can efficiently deal with uniform distributed data,it is noticed from the principle of their design schemes that these techniques cannot handle non-uniform distributed data well.To tackle the above problem,this paper presents a new LSH based technique,called M2LSH (2 Layers Merging LSH),to efficiently process ANN searches on non-uniform distributed data.The key idea is as follows.First,data objects are stored in a hash table with multiple buckets (i.e.,the first layer),each of which corresponds to a combined hash vector used as its hash number.The count of data objects in each hash bucket is recorded.Secondly,using the hash functions for a second layer hash table,the bucket numbers from the first layer are projected into a one-dimensional space.In this space,some adjacent hash buckets from the first layer are merged so as to make the data objects uniformly distributed in the merged buckets at the second layer.Therefore,the M2LSH can not only improve the searching efficiency but also increase the accuracy of the search results.This paper also provides a detailed theoretical accuracy analysis for the M2LSH technique.Experiments using synthetic and real data show that the theoretical estimates are quite accurate,and the proposed M2LSH technique can efficiently process ANN searches with low false positive and negative rates.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号