首页> 外文会议>ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008 >Locality Sensitive Hash Functions Based on Concomitant Rank Order Statistics
【24h】

Locality Sensitive Hash Functions Based on Concomitant Rank Order Statistics

机译:基于伴随的秩序统计的局部敏感哈希函数

获取原文

摘要

Locality Sensitive Hash functions are invaluable tools for approximate near neighbor problems in high dimensional spaces. In this work, we are focused on LSH schemes where the similarity metric is the cosine measure. The contribution of this work is a new class of locality sensitive hash functions for the cosine similarity measure based on the theory of concomitants, which arises in order statistics. Consider n i.i.d sample pairs, {(X_1, Y_1), (X_2, Y_2),..., (X_n, Y_n)} obtained from a bivariate distribution f(X,Y). Concomitant theory captures the relation between the order statistics of X and Y in the form of a rank distribution given by Prob(Rank(Y_1)=j|Rank(X_i)=k). We exploit properties of the rank distribution towards developing a locality sensitive hash family that has excellent collision rate properties for the cosine measure.The computational cost of the basic algorithm is high for high hash lengths. We introduce several approximations based on the properties of concomitant order statistics and discrete transforms that perform almost as well, with significantly reduced computational cost. We demonstrate the practical applicability of our algorithms by using it for finding similar images in an image repository.
机译:局部敏感哈希函数是解决高维空间中的近似邻居问题的宝贵工具。在这项工作中,我们专注于LSH方案,其中相似性度量是余弦度量。这项工作的贡献是基于伴随统计理论的余弦相似度度量的一类新的局部敏感哈希函数。考虑从二元分布f(X,Y)获得的n个i.i.d样本对{(X_1,Y_1),(X_2,Y_2),...,(X_n,Y_n)}。伴随理论以Prob(Rank(Y_1)= j | Rank(X_i)= k)给出的秩分布的形式捕获X和Y的顺序统计量之间的关系。我们利用秩分布的特性来开发对余弦量度具有出色碰撞率特性的局部敏感哈希家族。 对于高散列长度,基本算法的计算成本很高。我们基于伴随的阶跃统计和离散变换的性能介绍了几种近似方法,它们的性能几乎一样好,并且显着降低了计算成本。我们通过在图像存储库中查找相似图像来证明我们算法的实际适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号