【24h】

Accelerating high-dimensional nearest neighbor queries

机译:加速高维最近邻查询

获取原文

摘要

The performance of nearest neighbor (NN) queries degrades noticeably with increasing dimensionality of the data due to reduced selectivity of high-dimensional data and an increased number of seek operations during NN-query execution. If the NN-radii were known in advance, the disk accesses could be reordered such that seek operations are minimized. We therefore propose a new way of estimating the NN-radius based on the fractal dimensionality and sampling. It is applicable to any page-based index structure. We show that the estimation error is considerably lower than for previous approaches. In the second part of the paper, we present two applications of this technique. We show how the radius estimations can be used to transform k-NN queries into at most two range queries, and how it can be used to reduce the number of page reads during all-NN queries. In both cases, we observe significant speedups over traditional techniques for synthetic and real-world data.
机译:由于在NN查询执行期间,由于减少了高维数据的选择性和增加的搜索操作,因此,最近邻居(NN)查询的性能明显降低了数据的程度,并且由于在NN查询执行期间的寻求操作增加。如果预先已知NN-RADII,则可以重新排序磁盘访问,使得寻址操作被最小化。因此,我们提出了一种基于分形维数和取样的新方法来估计NN半径。它适用于任何基于页面的索引结构。我们表明估计误差远低于以前的方法。在纸张的第二部分,我们提出了这项技术的两个应用。我们展示了RADIUS估计如何用于将K-NN查询转换为大多数两个范围查询,以及如何使用它来减少全部NN查询期间的页面读数。在这两种情况下,我们观察到了对综合和现实世界数据的传统技术的重大加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号