首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Fast Cosine Similarity Search in Binary Space with Angular Multi-Index Hashing
【24h】

Fast Cosine Similarity Search in Binary Space with Angular Multi-Index Hashing

机译:角多索引散列在二进制空间中的快速余弦相似度搜索

获取原文
获取原文并翻译 | 示例

摘要

Given a large dataset of binary codes and a binary query point, we address how to efficiently find K codes in the dataset that yield the largest cosine similarities to the query. The straightforward answer to this problem is to compare the query with all items in the dataset, but this is practical only for small datasets. One potential solution to enhance the search time and achieve sublinear cost is to use a hash table populated with binary codes of the dataset and then look up the nearby buckets to the query to retrieve the nearest neighbors. However, if codes are compared in terms of cosine similarity rather than the Hamming distance, then the main issue is that the order of buckets to probe is not evident. To examine this issue, we first elaborate on the connection between the Hamming distance and the cosine similarity. Doing this allows us to systematically find the probing sequence in the hash table. However, solving the nearest neighbor search with a single table is only practical for short binary codes. To address this issue, we propose the angular multi-index hashing search algorithm which relies on building multiple hash tables on binary code substrings. The proposed search algorithm solves the exact angular K nearest neighbor problem in a time that is often orders of magnitude faster than the linear scan baseline and even approximation methods.
机译:给定一个大型的二进制代码数据集和一个二进制查询点,我们解决如何有效地在数据集中找到与查询产生最大余弦相似度的K代码。这个问题的直接答案是将查询与数据集中的所有项目进行比较,但这仅适用于小型数据集。延长搜索时间并达到亚线性成本的一种潜在解决方案是使用填充有数据集二进制代码的哈希表,然后在查询中查找附近的存储桶以检索最近的邻居。但是,如果按照余弦相似度而不是汉明距离对代码进行比较,则主要问题是要探测的存储桶的顺序不明显。为了研究这个问题,我们首先详细说明汉明距离和余弦相似度之间的联系。这样做使我们能够在哈希表中系统地找到探测序列。但是,用单个表解决最近邻居搜索仅适用于短二进制代码。为了解决这个问题,我们提出了角度多索引哈希搜索算法,该算法依赖于在二进制代码子字符串上构建多个哈希表。所提出的搜索算法可以在比线性扫描基线甚至近似方法快几个数量级的时间内解决精确的角K最近邻问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号