An approximate algorithm for top-k closest pairs join query in large high dimensional data

Fabrizio Angiulli; Clara Pizzuti

首页> 外文期刊>Data & Knowledge Engineering >An approximate algorithm for top-k closest pairs join query in large high dimensional data

【24h】

An approximate algorithm for top-k closest pairs join query in large high dimensional data

机译：大型高维数据中前k个最接近对的联接查询的一种近似算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we present a novel approximate algorithm to calculate the top-k closest pairs join query of two large and high dimensional data sets. The algorithm has worst case time complexity O(d~2nk) and space complexity O(nd) and guarantees a solution within a O(d~(1+1/t)) factor of the exact one, where t ∈ {1,2,..., ∞} denotes the Minkowski metrics L_t of interest and d the dimensionality. It makes use of the concept of space filling curve to establish an order between the points of the space and performs at most d + 1 sorts and scans of the two data sets. During a scan, each point from one data set is compared with its closest points, according to the space filling curve order, in the other data set and points whose contribution to the solution has already been analyzed are detected and eliminated. Experimental results on real and synthetic data sets show that our algorithm behaves as an exact algorithm in low dimensional spaces; it is able to prune the entire (or a considerable fraction of the) data set even for high dimensions if certain separation conditions are satisfied; in any case it returns a solution within a small error to the exact one.

机译：在本文中，我们提出了一种新颖的近似算法，用于计算两个高维数据集的前k个最接近的对联接查询。该算法具有最坏情况的时间复杂度O（d〜2nk）和空间复杂度O（nd），并保证在正整数的O（d〜（1 + 1 / t））因子内进行求解，其中t∈{1， 2，...，∞}表示感兴趣的Minkowski度量L_t，d表示维度。它利用空间填充曲线的概念来建立空间点之间的顺序，并最多对两个数据集执行d +1排序和扫描。在扫描期间，根据空间填充曲线顺序，将一个数据集中的每个点与其最接近的点进行比较，然后在另一数据集中检测并消除其对解决方案的影响已被分析的点。对真实和合成数据集的实验结果表明，我们的算法在低维空间中的行为与精确算法相同。如果满足某些分离条件，即使对于高维，它也可以修剪整个（或相当一部分）数据集；无论如何，它都会在很小的误差范围内将解决方案返回到准确的解决方案。

著录项

来源
《Data & Knowledge Engineering》 |2005年第3期|p.263-281|共19页
作者
Fabrizio Angiulli; Clara Pizzuti;
展开▼
作者单位

ICAR-CNR Instituto di Calcolo e Reti ad Alte Prestazioni, Consiglio Nazionale delle Ricerche 87030 Rende, CS, Italy;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
technologies of databases; applications of data and knowledge engineering; high dimensional data; k-closest pairs; space filling curves;

机译：数据库技术;数据和知识工程的应用;高维数据;k最近对;空间填充曲线;

相似文献

外文文献
中文文献
专利

1. Algorithms for processing K-closest-pair queries in spatial databases [J] . A. Corral, Y. Manolopoulos, Y. Theodoridis, Data & Knowledge Engineering . 2004,第1期

机译：在空间数据库中处理K最接近对查询的算法
2. Approximate k-Closest-Pairs in Large High-Dimensional Data Sets [J] . Fabrizio Angiulli, Clara Pizzuti Journal of mathematical modelling and alogrithms . 2005,第2期

机译：大型高维数据集中的近似k-最近对
3. Approximate k-Closest-Pairs in Large High-Dimensional Data Sets [J] . Fabrizio Angiulli, Clara Pizzuti Journal of mathematical modelling and alogrithms . 2005,第2期

机译：大型高维数据集中的近似k-最近对
4. Top-k closest pairs join query: an approximate algorithm for large high dimensional data [C] . Angiulli, F., Pizzuti, . 2004

机译：前k个最接近的对联接查询：大型高维数据的近似算法
5. Approximate Clustering Algorithms for High Dimensional Streaming and Distributed Data [D] . Carraher, Lee A. 2018

机译：高维流和分布式数据的近似聚类算法
6. From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data [O] . Rainer Opgen-Rhein, Korbinian Strimmer 2007

机译：从相关到因果网络：一种简单的近似学习算法及其在高维植物基因表达数据中的应用
7. An Index Structure for Improving Closest Pairs and Related Join Queries in Spatial Databases [O] . Congjun Yang, King-ip Lin 2007

机译：改进空间数据库中最接近对和相关联接查询的索引结构

An approximate algorithm for top-k closest pairs join query in large high dimensional data

摘要

著录项

相似文献

相关主题

期刊订阅