...
首页> 外文期刊>Advances in Computer Science and Information Technology: ACSIT >A Recursive Partitioning Method for Nearest Neighbor Search in High Dimensional Data
【24h】

A Recursive Partitioning Method for Nearest Neighbor Search in High Dimensional Data

机译:高维数据中最近邻南搜索的递归分区方法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Many real world applications require searching a large amount of data to find the objects similar to the search queries. Nearest Neighbour Search is the common operation that has been used to conduct similarity search in vast areas like Content-Based Image Retrieval, Web search engines, micro array data analysis etc. Dimensionality forces us to look at data from a different perspective when dealing with such large data. In this work we propose a recursive partitioning and distance-based indexing scheme for large and high-dimensional data to retrieve the nearest neighbours for a given query. This method works by dividing the data space into disjoint partitions based on the distance from a reference point to the data objects. In the next level for each sub-partition a reference point is selected and again it is partitioned into further sub-sub-partitions. Main advantage of this method is that it reduces the search space. To find the nearest neighbours we use Soergel distance metric which is a dissimilarity-based distance metric to find the association between two data objects. We are able to retrieve the top 100 nearest neighbours (kNN) by searching in only a single bin where all the nearest neighbours lie for the given query according to the distance from a reference point. We have validated our method by conducting experiments with the following data sets: ZINC data set, AT and T Faces Database. Our results show that the proposed method is having a very less construction (off-line) time and search (on-line) time compared to brute-force linear scan. Proposed method reduces or prunes the search space from 100 percent to 10 percent or even less, which will save lot of computation time. We verified the proposed method with other methods and compared the performance and the results are presented.
机译:许多真实世界应用程序需要搜索大量数据以查找类似于搜索查询的对象。最近的邻居搜索是用于在基于内容的图像检索,Web搜索引擎,微阵列数据分析等广大领域进行相似性搜索的常见操作。维度迫使我们在处理此类时从不同的角度查看数据大数据。在这项工作中,我们提出了一种用于大型和高维数据的递归分区和距离的索引方案,以检索给定查询的最近邻居。该方法通过将数据空间划分为基于与来自数据对象的参考点的距离的距离分区。在每个子分区的下一个级别中,选择参考点,并再次将其分成另一个子分区。此方法的主要优点是它减少了搜索空间。为了找到最近的邻居,我们使用Sogle距离度量是一种基于不相似的距离度量,以找到两个数据对象之间的关联。我们能够通过仅在单个仓库中搜索前100名最近的邻居(KNN),其中所有最近的邻居根据与参考点的距离为给定查询。我们通过使用以下数据集进行实验验证了我们的方法:zinc数据集,AT和T Faces数据库。我们的结果表明,与Brute-Force线性扫描相比,所提出的方法具有非常少的结构(离线)时间和搜索(在线)时间。提出的方法将搜索空间从100%降至10%甚至10%甚至更少,这将节省大量的计算时间。我们用其他方法验证了所提出的方法,并验证了性能和结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号