首页> 外文会议>International conference on very large databases >A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces
【24h】

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

机译:高维空间中相似性搜索方法的定量分析与绩效研究

获取原文

摘要

For similarity search in high-dimensional vector spaces (or 'HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on data-space partitioning. However, the performance of these methods generally degrades as dimensionality increases. Although this phenomenon-known as the 'dimensional curse'-is well known, little or no quantitative analysis of the phenomenon is available. In this paper, we provide a detailed analysis of partitioning and clustering techniques for similarity search in HDVSs. We show formally that these methods exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10. Consequently, we come up with an alternative organization based on approximations to make the unavoidable sequential scan as fast as possible. We describe a simple vector approximation scheme, called VA-file, and report on an experimental evaluation of this and of two tree-based index methods (an R~*-tree and an X-tree).
机译:对于高维向量空间(或'HDVSS')中的相似性搜索,研究人员在主要的数据空间分区中提出了许多基于主数据空间分区的新方法(或现有方法的适应)。然而,随着维度增加,这些方法的性能通常会降低。虽然这种现象称为“尺寸曲折 - 是众所周知的,但对于现象来说很少或没有定量分析。在本文中,我们提供了对HDVS中相似性搜索的分区和聚类技术的详细分析。我们正式展示这些方法在高维度下表现出线性复杂性,并且如果尺寸的数量超过10.因此,如果尺寸的数量超过10,则现有方法平均过于简单的顺序扫描。因此,我们基于近似的替代组织提出尽可能快地保持不可避免的连续扫描。我们描述了一种简单的向量近似方案,称为VA文件,并报告了这一点的实验评估和基于树的索引方法(AN r〜* -tree和X树)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号