...
首页> 外文期刊>Knowledge-Based Systems >BitHash: An efficient bitwise Locality Sensitive Hashing method with applications
【24h】

BitHash: An efficient bitwise Locality Sensitive Hashing method with applications

机译:BitHash:一种具有应用程序的高效按位局部敏感哈希方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Locality Sensitive Hashing has been applied to detecting near-duplicate images, videos and web documents. In this paper we present a Bitwise Locality Sensitive method by using only one bit per hash value (BitHash), the storage space for storing hash values is significantly reduced, and the estimator can be computed much faster. The method provides an unbiased estimate of pairwise Jaccard similarity, and the estimator is a linear function of Hamming distance, which is very simple. We rigorously analyze the variance of One-Bit Min-Hash (BitHash), showing that for high Jaccard similarity. BitHash may provide accurate estimation, and as the pairwise Jaccard similarity increases, the variance ratio of BitHash over the original min-hash decreases. Furthermore, BitHash compresses each data sample into a compact binary hash code while preserving the pairwise similarity of the original data. The binary code can be used as a compressed and informative representation in replacement of the original data for subsequent processing. For example, it can be naturally integrated with a classifier like SVM. We apply BitHash to two typical applications, near-duplicate image detection and sentiment analysis. Experiments on real user's photo collection and a popular sentiment analysis data set show that, the classification accuracy of our proposed method for two applications could approach the state-of-the-art method, while BitHash only requires a significantly smaller storage space. (C) 2016 Elsevier B.V. All rights reserved.
机译:位置敏感散列已应用于检测几乎重复的图像,视频和Web文档。在本文中,我们通过每个哈希值(BitHash)仅使用一个位,提出了一种按位局部性敏感方法,显着减少了存储哈希值的存储空间,并且可以更快地计算估计量。该方法提供了成对的Jaccard相似度的无偏估计,并且该估计器是汉明距离的线性函数,非常简单。我们严格地分析了1位最小哈希(BitHash)的方差,表明对于Jaccard相似度很高。 BitHash可以提供准确的估计,并且随着成对的Jaccard相似度增加,BitHash与原始min-hash的方差比会减小。此外,BitHash将每个数据样本压缩为紧凑的二进制哈希码,同时保留原始数据的成对相似性。二进制代码可用作替换原始数据以进行后续处理的压缩且信息量丰富的表示形式。例如,它可以自然地与SVM等分类器集成。我们将BitHash应用于两个典型的应用程序,即近乎重复的图像检测和情感分析。对真实用户的照片集和流行的情绪分析数据集进行的实验表明,我们针对两种应用程序提出的方法的分类精度可以接近最新方法,而BitHash仅需要非常小的存储空间。 (C)2016 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2016年第1期|40-47|共8页
  • 作者单位

    Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl TNLIST Lab, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl TNLIST Lab, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl TNLIST Lab, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl TNLIST Lab, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl TNLIST Lab, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl TNLIST Lab, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Locality Sensitive Hashing; BitHash; Near-duplicate detection; Machine learning; Sentiment analysis; Storage efficiency;

    机译:局部敏感哈希;BitHash;近重复检测;机器学习;情感分析;存储效率;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号