在信息检索中,minwise哈希算法用于估值集合的相似度;b位minwise哈希算法则通过存储哈希值的b位来估算相似度,从而节省了存储空间和计算时间.分数位minwise哈希算法对各种精度和存储空间需求有着更加广泛的可选择性.对于给定的分数位f,构建f的方式有很多.分析了有限的分数位组合方式,给出最优化分数位的理论分析.大量的实验验证了此方法的有效性.%In information retrieval,minwise hashing algorithm is often used to estimate similarities among documents, and b- bit minwise hashing is capable of gaining substantial advantages in terms of computational efficiency and storage space by only storing the lowest b bits of each(minwise) hashed valueCe. g. ,6=1 or 2). Fractional bit minwise hashing has a wider range of selectivity for accuracy and storage space requirements. For the fixed fraction f, there are so many combinations of /. We theoretically analyzed limited combinations of fractional bit The optimal fractional bit was found. Experimental results demonstrate the effectiveness of this method.
展开▼