首页> 美国卫生研究院文献>Nucleic Acids Research >HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
【2h】

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

机译:HipMCL:大规模网络的马尔可夫聚类算法的高性能并行实现

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.
机译:生物网络捕获相关实体(例如分子,蛋白质或基因)的结构或功能特性。特征示例是基因表达网络或蛋白质-蛋白质相互作用网络,其中包含有关功能亲和性或结构相似性的信息。由于生物数据的规模和数量的增加,这种网络的规模正在扩大。虽然已经提出了各种聚类算法来查找高度连通的区域,但马尔可夫聚类(MCL)已经成为聚类序列相似性或表达网络最成功的方法之一。尽管MCL颇受欢迎,但由于运行时间长和内存需求大,因此它对大型数据集进行聚类的可扩展性仍然是瓶颈。在这里,我们介绍了高性能MCL(HipMCL),它是可在分布式内存计算机上运行的原始MCL算法的并行实现。我们证明HipMCL可以有效利用2000个计算节点,并在约2.4小时内将约7000万个节点和约680亿个边缘的网络聚类。通过利用分布式内存环境,HipMCL对大型网络进行群集的速度比MCL快几个数量级,并且可以对更大的网络进行群集。 HipMCL基于MPI和OpenMP,可在经过修改的BSD许可证下免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号