...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Randomness Efficient Feature Hashing for Sparse Binary Data
【24h】

Randomness Efficient Feature Hashing for Sparse Binary Data

机译:稀疏二进制数据的随机性高效功能散列

获取原文
           

摘要

We present sketching algorithms for sparse binary datasets, which maintain binary version of the dataset after sketching, while simultaneously preserving multiple similarity measures such as Jaccard Similarity, Cosine Similarity, Inner Product, and Hamming Distance, on the same sketch. A major advantage of our algorithms is that they are randomness efficient, and require significantly less number of random bits for sketching – logarithmic in dimension, while other competitive algorithms require linear in dimension. Our proposed algorithms are efficient, offer a compact sketch of the dataset, and can be efficiently deployed in a distributive setting. We present a theoretical analysis of our approach and complement them with extensive experimentations on public datasets. For analysis purposes, our algorithms require a natural assumption on the dataset. We empirically verify the assumption and notice that it holds on several real-world datasets.
机译:我们为稀疏二进制数据集提供了速写算法,其在素描后维护数据集的二进制版本,同时保留在同一草图上的多种相似度措施,如Jaccard相似性,余弦相似性,内部产品和汉明距离。我们的算法的一个主要优点是它们是随机性有效的,并且需要大量的尺寸素描对数的随机比特,而其他竞争算法需要线性维度。我们所提出的算法有效,提供数据集的紧凑素描,可以在分配设置中有效地部署。我们对我们的方法提供了一个理论分析,并在公共数据集中与广泛的实验相提并论。出于分析目的,我们的算法需要在数据集上进行自然假设。我们经验验证了假设,并注意它持有几个真实世界数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号