首页> 外文会议>Conference on Neural Information Processing Systems >Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling
【24h】

Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling

机译:用于一个置换散列的重新随机化致密化和宾馆一致加权采样

获取原文

摘要

Jaccard similarity is widely used as a distance measure in many machine learning and search applications. Typically, hashing methods are essential for the use of Jaccard similarity to be practical in large-scale settings. For hashing binary (0/1) data, the idea of one permutation hashing (OPH) with densification significantly accelerates traditional minwise hashing algorithms while providing unbiased and accurate estimates. In this paper, we propose a "re-randomization" strategy in the process of densification and we show that it achieves the smallest variance among existing densification schemes. The success of this idea inspires us to generalize one permutation hashing to weighted (non-binary) data, resulting in the so-called "bin-wise consistent weighted sampling (BCWS)" algorithm. We analyze the behavior of BCWS and compare it with a recent alternative. Experiments on a range of datasets and tasks confirm the effectiveness of proposed methods. We expect that BCWS will be adopted in practice for training kernel machines and fast similarity search.
机译:Jaccard相似性广泛用作许多机器学习和搜索应用中的距离测量。通常,散列方法对于使用Jaccard相似性是必不可少的,以在大规模设置中实用。对于散列二进制(0/1)数据,具有致密化的一个置换散列(OPH)的思想显着加速了传统的Minive散列算法,同时提供了无偏见和准确的估计。在本文中,我们提出了在致密化过程中提出了“重新随机化”策略,并表明它实现了现有致密化方案之间的最小差异。该想法的成功激发了我们向加权(非二进制)数据概括一个置换散列,从而导致所谓的“Bin-Wise一致加权采样(BCWS)”算法。我们分析了BCWS的行为,并将其与最近的替代方案进行比较。关于一系列数据集和任务的实验证实了所提出的方法的有效性。我们预计BCW将在实践中采用培训内核机器和快速相似性搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号