...
【24h】

Scaling up Simhash

机译:缩放辛赫什

获取原文
           

摘要

The seminal work of (Charikar, 2002) gives a space efficient sketching algorithm (Simhash) which compresses real-valued vectors to binary vectors while maintaining an estimate of the Cosine similarity between any pairs of original real-valued vectors. In this work, we propose a sketching algorithm – Simsketch – that can be applied on top of the results obtained from Simhash. This further reduces the data dimension while maintaining an estimate of the Cosine similarity between original real-valued vectors. As a consequence, it helps in scaling up the performance of Simhash. We present theoretical bounds of our result and complement it with experimentation on public datasets. Our proposed algorithm is simple, efficient, and therefore can be adopted in practice.
机译:(Charikar,2002)的开创性工作提供了一个空间有效的草图算法(Simhash),其将实值向量压缩到二进制矢量,同时保持对任何原始实值矢量的对余弦相似性的估计。在这项工作中,我们提出了一种草图算法 - Simsketch - 可以应用于从Simhash获得的结果之上。这进一步降低了数据尺寸,同时保持原始实值矢量之间的余弦相似度的估计。结果,它有助于扩大兴趣的性能。我们呈现了我们的理论界,并在公共数据集的实验中补充了它。我们所提出的算法简单,高效,因此可以在实践中采用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号