Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

机译：使用大规模并行图形处理单元计算的生物信息学中的快速并行马尔可夫聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4times and 9 were obtained, respectively. On the Human Protein Reference Database, the spe--ed of clustering of19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.

机译：马尔可夫聚类正成为生物信息学中用于确定网络中聚类的关键算法。例如，聚类蛋白质相互作用网络正在帮助寻找与癌症等疾病有关的基因。但是，随着快速测序和其他技术在生物网络上生成大量数据，性能和可伸缩性问题已成为应用程序中的关键限制因素。同时，在GPU卡中使用大规模并行计算环境的图形处理（GPU）计算正成为一种非常强大，高效且低成本的选择，以实现与CPU方法相比可观的性能提升。本文介绍了一种基于GPU中大规模并行计算的非常快速的马尔可夫聚类算法（MCL）。我们使用Compute Unified Device Architecture（CUDA）来允许GPU执行并行稀疏矩阵矩阵计算和并行稀疏Markov矩阵归一化，这是聚类算法的核心。优化CUDA Markov聚类（CUDAMCL）的关键是利用ELLACK-R稀疏数据格式，以进行有效且细粒度的大规模并行处理，以应对生物信息学应用程序中交互网络数据集的稀疏性质。 CUDA还允许我们有效地使用GPU上的片上内存，以减少延迟时间，从而避免了诸如消息传递接口（MPI）之类的其他并行计算环境中的主要问题。在这里，我们描述了GPU算法及其在若干实际问题以及人工数据集中的应用。我们发现导致GPU方法性能变化的主要因素是网络的相对稀疏性。在已发布的（相对稀疏的）标准BIOGRID蛋白质交互网络上将GPU的计算时间与现代的四核CPU进行比较，其具有5156和23175个节点，速度因子分别为4倍和9。在人类蛋白质参考数据库中， -- 通过GPU算法，将19599种蛋白质的聚类图提高了7倍。但是，在具有1600至4800个节点的人工生成的紧密连接的网络上，可以轻松获得40倍至120倍范围内的加速。结果表明，在所有情况下，GPU的实现都比在CPU上运行的原始MCL显着更快。这种方法允许在现成的台式计算机上进行大规模并行计算，而以前只能在超级计算体系结构上进行，并且有可能显着改变生物信息学家和生物学家计算数据并与之交互的方式。

著录项

来源
《Ninth International Workshop on Parallel and Distributed Methods in Verification, Second International Workshop on High Performance Computational Systems Biology》|2010年|p.116-125|共10页
会议地点
作者
Bustamam Alhadi; Burrage Kevin; Hamilton Nicholas A.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Bioinformatics; CUDA; ELLPACK-R; GPU computing; MCL; Massively parallel computing; PPI network; Parallel Markov clustering;

机译：生物信息学; CUDA; ELLPACK-R; GPU计算; MCL;大规模并行计算; PPI网络;并行马尔可夫聚类;
入库时间 2022-08-26 14:20:58

相似文献

外文文献
中文文献
专利

1. Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format [J] . Bustamam Alhadi, Burrage Kevin, Hamilton Nicholas A. Computational Biology and Bioinformatics, IEEE/ACM Transactions on . 2012,第3期

机译：在具有CUDA和ELLPACK-R稀疏格式的GPU上使用大规模并行计算在生物信息学中进行快速并行Markov聚类
2. The fast multipole method on parallel clusters, multicore processors, and graphics processing units [J] . Darve E., Cecka C., Takahashi T. Comptes rendus. Mecanique . 2011,第2a3期

机译：并行集群，多核处理器和图形处理单元上的快速多极方法
3. GPUDePiCt: A Parallel Implementation of a Clustering Algorithm for Computing Degenerate Primers on Graphics Processing Units [J] . Cickovski Trevor, Flor Tiffany, Irving-Sachs Galen, Computational Biology and Bioinformatics, IEEE/ACM Transactions on . 2015,第2期

机译：GPUDePiCt：用于在图形处理单元上计算简并引物的聚类算法的并行实现
4. Fast Parallel Markov Clustering in Bioinformatics using Massively Parallel Graphics Processing Unit Computing [C] . Alhadi Bustamam, Kevin Burrage, Nicholas A. Hamilton International Workshop on Parallel and Distributed Methods in Verification . 2010

机译：使用大规模并行图形处理单元计算，生物信息学中快速并行Markov聚类
5. Massively parallel engineering simulations on graphics processors: Parallelization, synchronization, and approximation [D] . Steuben, John C. 2014

机译：图形处理器上的大规模并行工程仿真：并行化，同步和逼近
6. Parallelizing Affinity Propagation Using Graphics Processing Units for Spatial Cluster Analysis over Big Geospatial Data [O] . Xuan Shi -1

机译：使用图形处理单元对亲和力进行并行传播以对大地理空间数据进行空间聚类分析
7. Study on Consulting Air Combat Simulation of Cluster UAV Based on Mixed Parallel Computing Framework of Graphics Processing Unit [O] . Chien-Chun Kung 2018

机译：基于混合平行计算框架的集群UV咨询空战仿真研究

Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

摘要

著录项

相似文献

相关主题

期刊订阅