【24h】

Multiple Query-Independent Values Based Asymmetric Ranking for Approximate Nearest Neighbor Search

机译:近似最近邻居搜索的基于多个查询独立值的不对称排序

获取原文
获取原文并翻译 | 示例

摘要

With the advantage of computational and storage efficiency, binary encoding and Hamming distance are widely used in approximate nearest neighbor search. However, a large number of candidate points share the same Hamming distance with the query when the database contains too many points, leading to the issue of confusing ranking. Therefore, asymmetric ranking is proposed to address this issue, in which they only encode the candidate points into binary codes while computing two query-independent values that represent 0 and 1 respectively for each bit, thereby computing asymmetric distance for ranking. The asymmetric distance between the query point and one candidate point equals to the 1-norm distance between the query and a vector of query-independent values corresponding to its binary code. It is more accurate than Hamming distance because of taking advantage of the full information of the query, but the information of candidate points is lost when only computing two query-independent for each bit of encoded candidate points. Therefore, we propose the Multiple query-independent values based Asymmetric Ranking(MARank) in which more than two query-independent values are computed for each bit in order to divide the distance space more densely. The MARank is applicable to many kinds of binary encoding methods such as LSH, SH, ITQ, PCAH and so on. We conduct experiments on five datasets: SIFT-10K, CIFAR-10, Caltech-256, MNIST and NUS-WIDE to compare MARank with the asymmetric ranking in terms of time and accuracy. The results show that the proposed MARank algorithms can achieve up to 22% performance gains over Hamming distance and 13% over the asymmetric distance.
机译:凭借计算和存储效率的优势,二进制编码和汉明距离被广泛用于近似最近邻居搜索。但是,当数据库包含太多点时,大量候选点与查询共享相同的汉明距离,从而导致排名混乱。因此,提出了非对称排序来解决该问题,其中它们仅在计算两个查询独立值(对于每个位分别表示0和1)时仅将候选点编码为二进制代码,从而计算用于排序的非对称距离。查询点与一个候选点之间的不对称距离等于查询与对应于其二进制代码的独立于查询的值的向量之间的1-范数距离。由于利用了查询的全部信息,它比汉明距离更准确,但是当仅对编码的候选点的每一位计算两个独立于查询时,候选点的信息就会丢失。因此,我们提出了基于多个查询无关值的不对称排序(MARank),其中为每个位计算两个以上查询无关值,以便更密集地划分距离空间。 MARank适用于多种二进制编码方法,例如LSH,SH,ITQ,PCAH等。我们在五个数据集上进行了实验:SIFT-10K,CIFAR-10,Caltech-256,MNIST和NUS-WIDE,以比较MARank在时间和准确性上与不对称排名。结果表明,提出的MARank算法在汉明距离上可实现22%的性能提升,在非对称距离上可实现13%的性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号