首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
【24h】

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

机译:用于近似最近邻搜索的散列算法的重新访问

获取原文
获取原文并翻译 | 示例

摘要

Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims to outperform Locality Sensitive Hashing (LSH), which is the most popular hashing method. However, the evaluation of these hashing article was not thorough enough, and the claim should be re-examined. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing article only report the performance with the code length shorter than 128. In this article, we carefully revisit the problem of search-with-a-hash-index and analyze the pros and cons of two popular hash index search procedures. Then we proposed a simple but effective novel hash index search approach and made a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing ranked the first, which is in contradiction to the claims in all the other 10 hashing article. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the article are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.
机译:近似最近邻搜索(人工神经网络)是机器学习和数据挖掘的许多领域的一个基本问题。在过去十年中,无数的哈希算法,提出解决这个问题。每个算法要求超越局部敏感散列(LSH),这是最流行的散列方法。然而,这些散列文章的评价是不够彻底,而且要求应重新审查。如果正确实施,几乎所有的哈希方法将作为代码长度的增加其性能的提高。然而,许多现有的散列文章仅报告与代码长度超过128短在这篇文章中的表现,我们仔细重温搜索上带有一个散列索引的问题,并分析了两种流行的散列索引的搜索程序的利弊。然后,我们提出了一个简单而有效的新的散列索引搜索方法并取得十种流行的哈希算法的全面比较。令人惊讶地,随机投影基于局部性敏感哈希排名第一,这是矛盾的在所有其它10散列制品的权利要求。尽管基于随机投影LSH极致简约,我们的研究结果表明,该算法的性能已经远远低估。再现性起见,本文中使用的所有代码被释放GitHub上,其可以被用作测试平台关于各种散列算法之间进行公平比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号