首页> 中文期刊> 《计算机工程与科学》 >基于CombBLAS的同辈压力图聚类并行算法的设计与实现

基于CombBLAS的同辈压力图聚类并行算法的设计与实现

         

摘要

图聚类是指把图中相对连接紧密的顶点及其相关的边分组形成一个子图的过程,在包括机器学习、数据挖掘、模式识别、图像分析及生物信息等领域有着广泛应用.但是,随着大数据时代的到来,图数据海量增长.面对广泛的大规模图计算需求,由于图结构本身的不规则性,单机算法运行效率低下,用传统的并行计算方法进行图计算难以获得高性能.使用线性代数的方法在Combinatorial BLAS上实现了同辈压力(Peer Pressure)图聚类的分布式算法,首先将该图聚类的算法转换为对稀疏矩阵的运算,从而结构化表示图的不规则数据结构及接入模式,然后基于MPI编程模型将其并行实现.实验结果表明,在并行处理规模达到43亿的由稀疏矩阵表示的超大规模图时,基于线性代数表示的同辈压力图聚类算法在曙光超级计算机上取得了较高的并行性能及良好的可扩展性,在64个核上获得了40.1的并行加速.%Graph clustering is a problem of determining natural groups with high connectivity in a graph.This can be useful in fields such as machine learning,data mining,pattern recognition,image analysis and bioinformatics.To meet the graph-theoretic analysis demands of emerging"big data" applications,it is essential to speed up the underlying graph problems of current parallel systems.However,it is difficult to parallelize large-scale graph computation and achieve good performance using traditional approaches due to their irregular graph structure and low operation intensity.We implement a scalable distributed-memory algorithm for peer pressure graph clustering using the sparse matrix infrastructure in Combinatorial BLAS.We first convert the peer pressure graph clustering algorithm to sparse matrix computation,which allows irregular data structures and access patterns in parallel applications to be represented and can efficiently address the graph parallel challenge.Finally,the proposed algorithm is parallelized based on the MPI programming model.Experiments show that when the scale of the graph represented by a sparse matrix is up to 4.3 billion,the parallel peer pressure clustering algorithm based on linear algebraic has high performance and is well scalable on the Dawning Supercomputer,and the speedup can be up to 40.1x when the number of core scales to 64.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号