首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics
【24h】

Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics

机译:大数据集的并行聚类算法及其在生物信息学中的应用

获取原文
获取原文并翻译 | 示例

摘要

Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP.
机译:在解决簇识别问题的同时,大量的生物信息数据集在时间消耗方面提出了挑战,这就是为什么需要并行算法来在嘈杂的背景下识别密集簇的原因。我们的算法对要分析的数据集进行图形表示。它通过识别密集的内部连接的子图来识别集群。我们采用了图形的最小生成树(MST)表示,并使用该表示解决了聚类识别问题。我们算法的计算瓶颈是图的MST的构造,为此采用了并行算法。对于并行MST构造算法,我们的高级策略是首先对图进行分区,然后基于子图构造分区子图和辅助二部图的MST,最后合并这些MST以得出原始图的MST。计算结果表明,当在150个CPU上运行时,我们的算法可以解决具有1,000,000个数据点的数据集上的集群识别问题,几乎比单个CPU上快100倍,表明该程序能够处理非常大的数据聚类问题。有效的方式。我们已经将聚类算法实现为软件CLUMP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号