【24h】

Paralleled Fast Search and Find of Density Peaks clustering algorithm on GPUs with CUDA

机译:与CUDA的GPU上的密度快速搜索和查找密度峰集聚类算法

获取原文

摘要

Fast Search and Find of Density Peaks (FSFDP) is a newly proposed clustering algorithm that has already been successfully applied in many applications. However, this algorithm shows a dissatisfactory performance on large dataset due to the time-consuming calculation of the distance matrix and potentials. In this paper, we proposed a GPU-accelerated FSFDP with CUDA to improve its performance. Thread/block models and the shared memory usage are dedicatedly designed to maximize the utilization of GPUs' hardware resources, and a merge accumulation algorithm based on the odd and even positions of an array is introduced as well. Experimental results show that our parallel implementation of FSFDP can reach a 4.39X and a 15.75X speedup for the calculation of the distance matrix and potentials respectively compared to the serial program on a single CPU core. Higher speedup can be expected for data of larger scales until the device limits are reached. Besides, CUDA stream mechanism is also employed and extra time savings can be obtained by hiding the corresponding memory latency of multiple kernels in a two-way streams' scheduling. Moreover, we evaluate our GPU-based implementation on GPU clusters of 9 nodes and compared to one GPU node, the program can achieve a further 7.55X speedup.
机译:快速搜索和密度峰(FSFDP)的发现是,已经在许多应用中得到了成功应用新提出的聚类算法。然而,该算法示出了由于距离矩阵和电位的耗时计算大数据集不满意性能。在本文中,我们提出了GPU加速的FSFDP使用CUDA来提高其性能。螺纹/块模型和共享存储器的使用被专用地设计成最大化的图形处理器的硬件资源的利用率,并且基于阵列的奇数和偶数位置的合并累积算法引入也是如此。实验结果表明,我们的并行实现FSFDP可以分别比在单个CPU核心串行程序达到4.39X和15.75X加速的距离矩阵和潜力的计算。更高的加速可以预期对于较大规模的数据,直到该装置达到极限。此外,CUDA流机构也采用和额外的时间节省可以通过隐藏多个内核的对应存储器延迟在双向流调度来获得。此外,我们评估的9个节点的GPU集群我们基于GPU的执行情况和与一个GPU节点,该方案可以实现进一步7.55X加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号