首页> 中文期刊>计算机科学 >最优分数位minwise哈希算法的研究

最优分数位minwise哈希算法的研究

     

摘要

在信息检索中,minwise哈希算法用于估值集合的相似度;b位minwise哈希算法则通过存储哈希值的b位来估算相似度,从而节省了存储空间和计算时间.分数位minwise哈希算法对各种精度和存储空间需求有着更加广泛的可选择性.对于给定的分数位f,构建f的方式有很多.分析了有限的分数位组合方式,给出最优化分数位的理论分析.大量的实验验证了此方法的有效性.%In information retrieval,minwise hashing algorithm is often used to estimate similarities among documents, and b- bit minwise hashing is capable of gaining substantial advantages in terms of computational efficiency and storage space by only storing the lowest b bits of each(minwise) hashed valueCe. g. ,6=1 or 2). Fractional bit minwise hashing has a wider range of selectivity for accuracy and storage space requirements. For the fixed fraction f, there are so many combinations of /. We theoretically analyzed limited combinations of fractional bit The optimal fractional bit was found. Experimental results demonstrate the effectiveness of this method.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号