首页> 外文学位 >Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters.

【24h】

Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters.

机译：有效计算gpu群集上的大型高维数据集的k最近邻图。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The k-Nearest Neighbor Graph (k-NNG) and the related k-Nearest Neighbor (k-NN) methods have a wide variety of applications in areas such as bioinformatics, machine learning, data mining, clustering analysis, and pattern recognition. Our application of interest is manifold embedding. Due to the large dimensionality of the input data ( <15k), spatial subdivision based techniques such OBBs, k-d tree, BSP etc., are not viable. The only alternative is the brute-force search, which has two distinct parts. The first finds distances between individual vectors in the corpus based on a pre-defined metric. Given the distance matrix, the second step selects k nearest neighbors for each member of the query data set. This thesis presents the development and implementation of a distributed exact k-Nearest Neighbor Graph (k-NNG) construction method. The proposed method uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism for distributed computational systems using GPUs. It is scalable for different cluster sizes, with each compute node in the cluster containing multiple GPUs. The distance computation is formulated as a basic matrix multiplication and reduction operation. The optimized CUBLAS matrix multiplication library is used for this purpose. Various distance metrics such as Euclidian, cosine, and Pearson are supported. For k-NNG construction, two different methods are presented. The first is based on an approach called batch index sorting to build the k-NNG with three sorting operations. This method uses the optimized radix sort implementation in the Thrust library for GPU. The second is an efficient implementation using the latest GPU functionalities of a variant of the quick select algorithm. Overall, the batch index sorting based k-NNG method is approximately 13x faster than a distributed MATLAB implementation. The quick select algorithm itself has a 5x speedup over state-of-the art GPU methods. This has enabled the processing of k-NNG construction on a data set containing 20 million image vectors, each with dimension 15,000, as part of a manifold embedding technique for analyzing the conformations of biomolecules.

机译：k最近邻图（k-NNG）和相关k最近邻方法（k-NN）在生物信息学，机器学习，数据挖掘，聚类分析和模式识别等领域具有广泛的应用。我们感兴趣的应用是流形嵌入。由于输入数据的维度大（<15k），基于空间细分的技术（如OBB，k-d树，BSP等）不可行。唯一的选择是蛮力搜索，它有两个不同的部分。首先根据预先定义的指标找到语料库中各个向量之间的距离。给定距离矩阵，第二步为查询数据集的每个成员选择k个最近的邻居。本文提出了一种分布式精确k最近邻图（k-NNG）构造方法的开发与实现。所提出的方法使用图形处理单元（GPU），并为使用GPU的分布式计算系统开发了多层并行性。它可以针对不同的集群规模进行扩展，集群中的每个计算节点都包含多个GPU。距离计算公式化为基本矩阵乘法和约简运算。为此，使用了优化的CUBLAS矩阵乘法库。支持各种距离度量标准，例如欧几里得，余弦和皮尔逊。对于k-NNG构造，提出了两种不同的方法。第一种基于称为批索引排序的方法，可通过三个排序操作来构建k-NNG。该方法使用Thrust库中用于GPU的优化基数排序实现。第二个是使用快速选择算法变体的最新GPU功能的高效实现。总体而言，基于批处理索引排序的k-NNG方法比分布式MATLAB实现快约13倍。快速选择算法本身的速度是最先进的GPU方法的5倍。这是对包含2000万个图像矢量的数据集进行k-NNG构造的处理，每个图像矢量的尺寸为15,000，作为用于分析生物分子构象的多种嵌入技术的一部分。

著录项

作者
Dashti, Ali.;
展开▼
作者单位

The University of Wisconsin - Milwaukee.;

展开▼
授予单位 The University of Wisconsin - Milwaukee.;
学科 Biomedical engineering.;Computer science.
学位 M.S.
年度 2013
页码 67 p.
总页数 67
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Efficient K-Nearest Neighbor Graph Construction Using MapReduce for Large-Scale Data Sets [J] . Tomohiro WARASHINA, Kazuo AOYAMA, Hiroshi SAWADA, IEICE transactions on information and systems . 2014,第12期

机译：使用MapReduce对大型数据集进行有效的K最近邻图构造
2. Sequential random k-nearest neighbor feature selection for high-dimensional data [J] . Park Chan Hee, Kim Seoung Bum Expert Systems with Application . 2015,第5期

机译：高维数据的顺序随机k最近邻特征选择
3. Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification [J] . Nenad Tomasev, Dunja Mladenic Knowledge and information systems . 2014,第1期

机译：高维k最近邻分类的感知度共享邻居距离
4. K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching [C] . Garcia V., Debreuve E., Nielsen F., 17th IEEE International Conference on Image Processing . 2010

机译：K近邻搜索：基于GPU的快速实现和对高维特征匹配的应用
5. Techniques for efficient k-nearest neighbor searching in non-ordered discrete and hybrid data spaces. [D] . Kolbe, Dashiell Matthews. 2010

机译：在无序离散和混合数据空间中有效进行k最近邻搜索的技术。
6. Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters [O] . Ali Dashti, Ivan Komarov, Roshan M. D’Souza -1

机译：GPU群集上大型高维数据集的k最近邻图的高效计算
7. Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters. [O] . Ali Dashti, Ivan Komarov, Roshan M D'Souza 2013

机译：高效计算GpU集群上大型高维数据集的k-最近邻图。

Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters.

摘要

著录项

相似文献

相关主题

期刊订阅