When Is Nearest Neighbors Indexable?

机译：最近的邻居何时可索引？

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we consider whether traditional index structures are effective in processing unstable nearest neighbors workloads. It is known that under broad conditions, nearest neighbors workloads become unstable-distances between data points become indistinguishable from each other. We complement this earlier result by showing that if the workload for your application is unstable, you are not likely to be able to index it efficiently using (almost all known) multidimensional index structures. For a broad class of data distributions, we prove that these index structures will do no better than a linear scan of the data as dimensionality increases. Our result has implications for how experiments should be designed on index structures such as R-Trees, X-Trees and SR-Trees: Simply put, experiments trying to establish that these index structures scale with dimensionality should be designed to establish cross-over points, rather than to show that the methods scale to an arbitrary number of dimensions. In other words, experiments should seek to establish the dimensionality of the dataset at which the proposed index structure deteriorates to linear scan, for each data distribution of interest; that linear scan will eventually dominate is a given. An important problem is to analytically characterize the rate at which index structures degrade with increasing dimensionality, because the dimensionality of a real data set may well be in the range that a particular method can handle. The results in this paper can be regarded as a step towards solving this problem. Although we do not characterize the rate at which a structure degrades, our techniques allow us to reason directly about a broad class of index structures, rather than the geometry of the nearest neighbors problem, in contrast to earlier work.

机译：在本文中，我们考虑了传统的索引结构在处理不稳定的最近邻居工作负载方面是否有效。众所周知，在宽泛的条件下，最近的邻居工作负载变得不稳定，数据点之间的距离变得难以区分。通过显示如果应用程序的工作负载不稳定，您将无法使用（几乎所有已知的）多维索引结构对其进行高效索引，从而补充了此早期结果。对于广泛的数据分布类别，我们证明这些索引结构不会随着维度的增加而比线性扫描数据更好。我们的结果对应该如何设计诸如R树，X树和SR树之类的索引结构的实验产生了影响：简单地说，试图建立这些索引结构随维数缩放的实验应设计为建立交叉点，而不是表明方法可以缩放到任意数量的维度。换句话说，对于感兴趣的每个数据分布，实验都应尝试建立所提出的索引结构恶化为线性扫描的数据集的维数；线性扫描最终将占主导地位已成定局。一个重要的问题是分析表征索引结构随维数增加而退化的速率，因为实际数据集的维数很可能在特定方法可以处理的范围内。本文的结果可视为解决此问题的一步。尽管我们没有描述结构退化的速率，但与早期的工作相比，我们的技术使我们可以直接针对广泛的索引结构类，而不是最邻近问题的几何形状进行推理。

著录项

来源
《International Conference on Database Theory(ICDT 2005); 20050105-07; Edinburgh(GB)》|2005年|P.158-172|共15页
会议地点 Edinburgh(GB)
作者
Uri Shaft; Raghu Ramakrishnan;
展开▼
作者单位

Oracle Corporation, 500 Oracle Parkway, Redwood Shores, CA 94065 USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类各种专用数据库;
关键词
入库时间 2022-08-26 13:46:23

相似文献

外文文献
中文文献
专利

1. PANENE: A Progressive Algorithm for Indexing and Querying Approximate k-Nearest Neighbors [J] . IEEE transactions on visualization and computer graphics . 2020,第2期

机译：PANENE：用于索引和查询近似k最近邻居的渐进算法
2. An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio [J] . Qaddoura Raneem, Faris Hossam, Aljarah Ibrahim International journal of machine learning and cybernetics . 2020,第3期

机译：基于具有索引比的k最近邻的高效聚类算法
3. PANENE: A Progressive Algorithm for Indexing and Querying Approximate k-Nearest Neighbors [J] . Journal of international management . 2020,第2期

机译：panene：一种用于索引和查询近似k最近邻居的渐进算法
4. Indexing in k-Nearest Neighbor Graph by Hash-Based Hill-Climbing [C] . Munlika Rattaphun, Amorntip Prayoonwong, Chih- Yi Chiu Proceedings of the Sixteenth International Conference on Machine Vision Applications . 2019

机译：基于哈希的爬山法在k最近邻图中建立索引
5. Voting Nearest Neighbors: SVM Constraints Selection Algorithm Based on K-Nearest Neighbors [D] . Moreira da Costa, Leandro. 2019

机译：投票最近的邻居：基于K-Indect邻居的SVM约束选择算法
6. Fast open modification spectral library searching through approximate nearest neighbor indexing [O] . Wout Bittremieux, Pieter Meysman, William Stafford Noble, -1

机译：通过近似最近邻居索引快速开放修改谱库搜索
7. The K Group Nearest-Neighbor Query on Non-indexed RAM-Resident Data [O] . Roumelis George, Vassilakopoulos Michael, Corral Liria Antonio Leopoldo, 2016

机译：关于非索引Ram驻留数据的K组最近邻查询

When Is Nearest Neighbors Indexable?

摘要

著录项

相似文献

相关主题

期刊订阅