首页> 外文学位 >NAQ-tree: Effective index structure for similarity search in high dimensional space.
【24h】

NAQ-tree: Effective index structure for similarity search in high dimensional space.

机译:NAQ树:有效的索引结构,用于高维空间中的相似度搜索。

获取原文
获取原文并翻译 | 示例

摘要

Similarity search in high-dimensional metric space is the key operation in many applications, such as multi-media databases, content-based image retrieval, object recognition, molecular biology, medical imaging, security, wildlife, and critical health cases, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are mostly built by partitioning the data set based on distances to certain reference point(s). Using the index, the search is confined to a small number of partitions. Based on careful investigation and analysis of the existing indexing techniques described in the literature, this thesis proposes a new index structure, called NAQ-tree (Nested-Approximate-eQuivalence-class tree), which overcomes the disadvantages of the investigated index structures. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. NAQ-tree is effective in serving variety of queries, including k-nearest neighbor, approximate k-nearest neighbor and reverse k-nearest neighbor queries.;While other researchers separate the filtering and verification stages of the process used to locate reverse nearest neighbors, we integrate the filtering and verification steps and use the information obtained in the verification stage to further improve the filtering rate. Our approach delivers results earlier and hence well serves real-time applications. Actually, incremental reporting of the result is at least equally important for multi-step nearest neighbor search to reduce the number of expensive distance computations required to complete the search and to better serve applications where the distance is known at run-time. Algorithms described in the literature separate multi-step nearest neighbor search into filtering and refinement stages with the result delivered by the end of the refinement stage. We extended an existing algorithm in a way to produce the nearest neighbors incrementally in an optimal way in the sense that a nearest neighbor is output as soon as it can be determined using the existing information; thus, nearest neighbors are produced in order. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency.;Index Terms: high dimensionality, dimensionality reduction, indexing, similarity search, online monitoring, content-based image retrieval, k-nearest neighbor search, partitioning, reverse nearest neighbor search.;Approximate k-nearest neighbor search is essential for time critical applications which are mostly characterized by the need to search huge datasets; we reduce the dataset size by clustering, and then we use one representative from each cluster to build indexes using a set of NAQ-trees residing on multiple nodes of a parallel machine. In the query processing phase, the built indexes are queried in parallel and from each tree only a very small number of index nodes are searched to report the partial results which are combined into the final result. As reverse k-nearest neighbor queries are concerned, the properties of NAQ-tree provide strong pruning rules.
机译:高维度度量空间中的相似性搜索是许多应用程序中的关键操作,例如多媒体数据库,基于内容的图像检索,对象识别,分子生物学,医学成像,安全性,野生生物和重大健康案例等。数据集的高维度和巨大规模要求使用索引结构来促进搜索。最先进的索引结构主要是通过根据与某些参考点的距离对数据集进行分区来构建的。使用索引,搜索被限制在少量的分区中。在对文献中描述的现有索引技术进行仔细研究和分析的基础上,本文提出了一种新的索引结构,即NAQ-树(Nested-近似-eQuivalence类树),它克服了所研究索引结构的缺点。通过将数据集递归划分为嵌套的近似等价类来构造NAQ树。 NAQ树可有效地服务于各种查询,包括k最近邻,近似k最近邻和反向k最近邻查询。;虽然其他研究人员将用于定位反向最近邻的过程的过滤和验证阶段分开,我们整合了过滤和验证步骤,并使用在验证阶段获得的信息来进一步提高过滤率。我们的方法可以更快地提供结果,因此可以很好地服务于实时应用。实际上,增量报告结果至少对于多步最近邻居搜索同等重要,以减少完成搜索所需的昂贵距离计算的数量,并更好地为运行时已知距离的应用提供服务。文献中描述的算法将多步最近邻居搜索分为过滤和优化阶段,结果在优化阶段结束时传递。从某种意义上说,只要可以使用现有信息确定最近邻居,就可以以一种最佳方式递增地生成最近邻居,从而扩展了现有算法。因此,顺序产生最近的邻居。进行的分析和报告的比较测试结果证明了NAQ树在显着提高搜索效率方面的有效性。关键词:高维,降维,索引,相似度搜索,在线监控,基于内容的图像检索,k最近邻居搜索,划分,反向最近邻居搜索。近似k近邻搜索对于时间紧迫的应用程序至关重要,这些应用程序的主要特征是需要搜索庞大的数据集;我们通过聚类来减小数据集的大小,然后使用来自每个聚类的一个代表来使用一组位于并行计算机的多个节点上的NAQ树来建立索引。在查询处理阶段,并行查询已建立的索引,并且从每棵树中仅搜索极少量的索引节点以报告部分结果,这些结果被组合为最终结果。考虑到反向k最近邻居查询,NAQ树的属性提供了强大的修剪规则。

著录项

  • 作者

    Zhang, Ming.;

  • 作者单位

    University of Calgary (Canada).;

  • 授予单位 University of Calgary (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 170 p.
  • 总页数 170
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号