首页> 外国专利> REDUCING USE OF RANDOMNESS IN CONSISTENT UNIFORM HASHING

REDUCING USE OF RANDOMNESS IN CONSISTENT UNIFORM HASHING

机译:减少一致均匀散列中的随机使用

摘要

Documents that are near-duplicates may be determined using techniques involving consistent uniform hashing. A biased bit may be placed in the leading position of a sequence of bits that may be generated and subsequently used in comparison techniques to determine near-duplicate documents. Unbiased bits may be used in subsequent positions of the sequence of bits, after the biased bit, for use in comparison techniques. Samples may be used collectively, as opposed to individually, in the generation of biased bits. Sequences of bits may thus be produced not on a single sample basis, but for multiple samples, thereby amortizing the cost of generating randomness for the samples. Less than one bit of randomness per sample may be used.
机译:可以使用涉及一致的统一哈希的技术来确定几乎重复的文档。可以将偏置位放置在可以生成的位序列的开头位置,然后在比较技术中将其用于确定几乎重复的文档。未偏置的位可以在偏置位之后的位序列的后续位置中使用,以用于比较技术。可以在偏置位的生成中集体使用样本,而不是单独使用样本。因此可以不是在单个样本的基础上而是在多个样本的基础上产生比特序列,从而分摊了产生样本随机性的成本。每个样本可使用少于一位的随机性。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号