...
首页> 外文期刊>The VLDB journal >Query-aware locality-sensitive hashing scheme for norm
【24h】

Query-aware locality-sensitive hashing scheme for norm

机译:规范的查询感知的局部敏感哈希方案

获取原文
获取原文并翻译 | 示例

摘要

The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes to tackle the c-ANN search problem. Traditionally, LSH functions are constructed in a query-oblivious manner, in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer . In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition. Accordingly, a query-aware LSH function under a specific norm with is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. The query-aware bucket partitioning strategy can be easily implemented so that query performance is guaranteed. For each norm , based on the corresponding p-stable distribution, we propose a novel LSH scheme named query-aware LSH (QALSH) for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio . In addition, we propose a heuristic variant named QALSH to improve the scalability of QALSH. Extensive experiments show that QALSH and QALSH outperform the state-of-the-art schemes, especially in high-dimensional space. Specifically, by using a ratio , QALSH can achieve much better query quality.
机译:高维空间中的c近似最近邻(c-ANN)搜索问题在许多应用程序中至关重要,例如图像数据库和数据挖掘。局部敏感哈希(LSH)及其变体是解决c-ANN搜索问题的著名索引方案。传统上,LSH函数是以查询无关的方式构造的,即在任何查询到达之前先对存储分区进行分区。但是,更接近查询的对象可能会划分为不同的存储桶,这是不希望的。由于使用了不会查询的存储分区,因此,用于外部存储器的最新LSH方案(即C2LSH和LSB-Forest)只能以近似的整数比率工作。在本文中,我们介绍了一种新颖的查询感知桶分区概念,该概念将给定查询用作桶分区的“锚点”。因此,在特定范数下的具有查询意识的LSH函数是随机投影与具有查询意识的存储分区的结合,从而消除了传统的对查询无意识的LSH函数所需的随机移位。可以轻松实现查询感知的存储桶分区策略,从而确保查询性能。对于每个范数,基于相应的p稳定分布,我们提出了一种新颖的LSH方案,称为外部查询c-ANN搜索的查询感知LSH(QALSH)。我们的理论研究表明,QALSH在查询质量上享有保证。使用查询感知的LSH功能可使QALSH以任何近似比率工作。另外,我们提出了一种启发式变体QALSH,以提高QALSH的可伸缩性。大量实验表明,QALSH和QALSH优于最新方案,尤其是在高维空间中。具体来说,通过使用比率,QALSH可以实现更好的查询质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号