Paralleled Fast Search and Find of Density Peaks clustering algorithm on GPUs with CUDA

机译：与CUDA的GPU上的密度快速搜索和查找密度峰集聚类算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Fast Search and Find of Density Peaks (FSFDP) is a newly proposed clustering algorithm that has already been successfully applied in many applications. However, this algorithm shows a dissatisfactory performance on large dataset due to the time-consuming calculation of the distance matrix and potentials. In this paper, we proposed a GPU-accelerated FSFDP with CUDA to improve its performance. Thread/block models and the shared memory usage are dedicatedly designed to maximize the utilization of GPUs' hardware resources, and a merge accumulation algorithm based on the odd and even positions of an array is introduced as well. Experimental results show that our parallel implementation of FSFDP can reach a 4.39X and a 15.75X speedup for the calculation of the distance matrix and potentials respectively compared to the serial program on a single CPU core. Higher speedup can be expected for data of larger scales until the device limits are reached. Besides, CUDA stream mechanism is also employed and extra time savings can be obtained by hiding the corresponding memory latency of multiple kernels in a two-way streams' scheduling. Moreover, we evaluate our GPU-based implementation on GPU clusters of 9 nodes and compared to one GPU node, the program can achieve a further 7.55X speedup.

机译：快速搜索和密度峰（FSFDP）的发现是，已经在许多应用中得到了成功应用新提出的聚类算法。然而，该算法示出了由于距离矩阵和电位的耗时计算大数据集不满意性能。在本文中，我们提出了GPU加速的FSFDP使用CUDA来提高其性能。螺纹/块模型和共享存储器的使用被专用地设计成最大化的图形处理器的硬件资源的利用率，并且基于阵列的奇数和偶数位置的合并累积算法引入也是如此。实验结果表明，我们的并行实现FSFDP可以分别比在单个CPU核心串行程序达到4.39X和15.75X加速的距离矩阵和潜力的计算。更高的加速可以预期对于较大规模的数据，直到该装置达到极限。此外，CUDA流机构也采用和额外的时间节省可以通过隐藏多个内核的对应存储器延迟在双向流调度来获得。此外，我们评估的9个节点的GPU集群我们基于GPU的执行情况和与一个GPU节点，该方案可以实现进一步7.55X加速。

著录项

来源
《IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 》|2016年|1 v.|共6页
会议地点
作者
Mi Li; Jie Huang; Jingpeng Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词
Graphics processing units; Instruction sets; Kernel; Clustering algorithms; Entropy; Software algorithms; Acceleration;

机译：图形处理单元;指令集;内核;聚类算法;熵;软件算法;加速;

相似文献

外文文献
中文文献
专利

1. Extended Fast Search Clustering Algorithm : Widely Density Clusters, No Density Peaks [J] . Zhang WenKai, Li Jing Computer Science & Information Technology . 2015 ,第7期

机译：扩展的快速搜索聚类算法：宽密度簇，无密度峰值
2. Parallel Fast Walsh Transform Algorithm and Its Implementation with CUDA on GPUs [J] . Dusan Bikov, Iliya Bouyukliev Cybernetics and information technologies: CIT . 2017 ,第5期

机译：并行快速Walsh变换算法及其在GPU上的CUDA实现
3. Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format [J] . Bustamam Alhadi, Burrage Kevin, Hamilton Nicholas A. Computational Biology and Bioinformatics, IEEE/ACM Transactions on . 2012 ,第3期

机译：在具有CUDA和ELLPACK-R稀疏格式的GPU上使用大规模并行计算在生物信息学中进行快速并行Markov聚类
4. Paralleled Fast Search and Find of Density Peaks clustering algorithm on GPUs with CUDA [C] . Mi Li, Jie Huang, Jingpeng Wang IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing . 2016

机译：具有CUDA的GPU上的并行快速搜索和密度峰值聚类算法
5. Parallelizing Tabu Search Based Optimization Algorithm on GPUs [D] . Malleypally, Vinaya 2018

机译：基于并行禁忌搜索的GPU优化算法
6. CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU [O] . Hanyu Jiang, Narayan Ganesan 2016

机译：CUDAMPF：一个多层并行框架用于在启用CUDA的GPU上加速HMMER中的蛋白质序列搜索
7. Extended fast search clustering algorithm: widely density clusters, no density peaks [O] . Zhang, Wenkai, Li, Jing 2015

机译：扩展的快速搜索聚类算法：广泛的密度聚类，没有密度峰值

Paralleled Fast Search and Find of Density Peaks clustering algorithm on GPUs with CUDA

摘要

著录项

相似文献

相关主题

期刊订阅