首页> 美国卫生研究院文献>other >Distributed query-aware quantization for high-dimensional similarity searches

【2h】

Distributed query-aware quantization for high-dimensional similarity searches

机译：用于高维相似性搜索的分布式查询感知量化

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

The concept of similarity is used as the basis for many data exploration and data mining tasks. Nearest Neighbor (NN) queries identify the most similar items, or in terms of distance the closest points to a query point. Similarity is traditionally characterized using a distance function between multi-dimensional feature vectors. However, when the data is high-dimensional, traditional distance functions fail to significantly distinguish between the closest and furthest points, as few dissimilar dimensions dominate the distance function. Localized similarity functions, i.e. functions that only consider dimensions close to the query, quantize each dimension independently and only compute similarity for the dimensions where the query and the points fall into the same bin. These quantizations are query-agnostic. There is potential to improve accuracy when a query-dependent quantization is used.In this paper we propose a Query dependent Equi-Depth (QED) on-the-fly quantization method to improve high-dimensional similarity searches. The quantization is done for each dimension at query time and localized scores are generated for the closest p fraction of the points while a constant penalty is applied for the rest of the points. QED not only improves the quality of the distance metric, but also improves query time performance by filtering out non relevant data. We propose a distributed indexing and query algorithm to efficiently compute QED. Our experimental results show improvements in classification accuracy as well as query performance up to one order of magnitude faster than Manhattan-based sequential scan NN queries over datasets with hundreds of dimensions.

机译：相似性的概念被用作许多数据探索和数据挖掘任务的基础。最近邻居（NN）查询标识最相似的项，或者根据距离来确定最接近点到查询点的距离。传统上，相似性是使用多维特征向量之间的距离函数来表征的。但是，当数据是高维数据时，传统的距离函数无法显着地区分最近的点和最远的点，因为很少有不同的维度主导距离函数。局部相似度函数，即仅考虑维度接近查询的函数，独立地量化每个维度，并且仅针对查询和点落入同一单元格的维度计算相似度。这些量化与查询无关。当使用依赖于查询的量化时，有可能提高准确性。本文提出了一种基于查询的等速（QED）动态量化方法，以改进高维相似性搜索。在查询时对每个维度进行量化，并针对最接近的p个分数点生成局部分数，同时对其余的分数应用恒定的惩罚。 QED不仅可以提高距离度量的质量，还可以通过过滤掉无关数据来提高查询时间性能。我们提出了一种分布式索引和查询算法来有效地计算QED。我们的实验结果表明，与基于曼哈顿的数百维数据集上的顺序扫描NN查询相比，分类准确性和查询性能提高了一个数量级。

著录项

期刊名称 other
作者
Gheorghi Guzun; Guadalupe Canahuate;
展开▼
作者单位

展开▼
年(卷),期 -1(2018),-1
年度 -1
页码 373–384
总页数 39
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. High-dimensional similarity searches using query driven dynamic quantization and distributed indexing [J] . Guzun Gheorghi, Canahuate Guadalupe Distributed and Parallel Databases . 2020,第2期

机译：使用查询驱动动态量化和分布式索引的高维相似性搜索
2. High-dimensional similarity searches using query driven dynamic quantization and distributed indexing [J] . Ecological restoration . 2020,第2期

机译：使用查询驱动动态量化和分布式索引的高维相似性搜索
3. LSH-based distributed similarity indexing with load balancing in high-dimensional space [J] . Wu Jiagao, Shen Lu, Liu Linfeng Journal of supercomputing . 2020,第1期

机译：高空间空间基于LSH的负载均衡分布式相似性索引
4. High-Dimensional Similarity Searches Using A Metric Pseudo-Grid [C] . Digout, C., Nascimento, . 2005

机译：使用度量伪网格的高维相似性搜索
5. Quantization techniques for similarity search in high-dimensional data spaces. [D] . Garcia-Arellano, Christian Marcelo. 2002

机译：用于在高维数据空间中进行相似性搜索的量化技术。
6. Supporting Dynamic Quantization for High-Dimensional Data Analytics [O] . Gheorghi Guzun, Guadalupe Canahuate -1

机译：支持动态量化以进行高维数据分析
7. High-Dimensional Similarity Searches Using A Metric Pseudo-Grid [O] . Christian Digout, Mario A. Nascimento 2008

机译：使用度量伪网格的高维相似性搜索

Distributed query-aware quantization for high-dimensional similarity searches

摘要

著录项

相似文献

相关主题

期刊订阅