首页>
外国专利>
REDUCING USE OF RANDOMNESS IN CONSISTENT UNIFORM HASHING
REDUCING USE OF RANDOMNESS IN CONSISTENT UNIFORM HASHING
展开▼
机译:减少一致均匀散列中的随机使用
展开▼
页面导航
摘要
著录项
相似文献
摘要
Documents that are near-duplicates may be determined using techniques involving consistent uniform hashing. A biased bit may be placed in the leading position of a sequence of bits that may be generated and subsequently used in comparison techniques to determine near-duplicate documents. Unbiased bits may be used in subsequent positions of the sequence of bits, after the biased bit, for use in comparison techniques. Samples may be used collectively, as opposed to individually, in the generation of biased bits. Sequences of bits may thus be produced not on a single sample basis, but for multiple samples, thereby amortizing the cost of generating randomness for the samples. Less than one bit of randomness per sample may be used.
展开▼