...
首页> 外文期刊>Communications of the ACM >Theory and Applications of b-Bit Minwise Hashing
【24h】

Theory and Applications of b-Bit Minwise Hashing

机译:b位最小方向散列的理论和应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Efficient (approximate) computation of set similarity in very large datasets is a common task with many applications in information retrieval and data management. One common approach for this task is minwise hashing. This paper describes b-bit minwise hashing, which can provide an order of magnitude improvements in storage requirements and computational overhead over the original scheme in practice. We give both theoretical characterizations of the performance of the new algorithm as well as a practical evaluation on large real-life datasets and show that these match very closely. Moreover, we provide a detailed comparison with other important alternative techniques proposed for estimating set similarities. Our technique yields a very simple algorithm and can be realized with only minor modifications to the original minwise hashing scheme.
机译:在大型数据集中高效(近似)集合相似性计算是信息检索和数据管理中许多应用程序的常见任务。一种用于此任务的常用方法是分向哈希。本文介绍了b位分向哈希,它可以比实际方案在存储要求和计算开销方面提高一个数量级。我们既给出了新算法性能的理论表征,也给出了对大型现实数据集的实际评估,并表明它们非常匹配。此外,我们与建议用来估计集合相似性的其他重要替代技术进行了详细比较。我们的技术产生了一个非常简单的算法,并且只需对原始的敏锐散列方案进行少量修改即可实现。

著录项

  • 来源
    《Communications of the ACM》 |2011年第8期|p.101-108|共8页
  • 作者

    Ping Li; Amd Christian Konig;

  • 作者单位

    Department of Statistical Science, Faculty of Computing and Information Science, Cornell University, Ithaca, NY;

    Microsoft Research, Microsoft Corporation, Redmond, WA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号