...
首页> 外文期刊>Experimental Mechanics >CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm
【24h】

CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm

机译:基于最近邻链算法的支持CUDA的蛋白质结构分层病房聚类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarchical Ward clustering of protein structures, using both atom-based root mean square deviation (RMSD) and rigid-body RMSD molecular distances on a graphics processing unit (GPU). This leads to a speedup of around one order of magnitude of our CUDA implementation on a GeForce Titan GPU compared to a multi-threaded CPU implementation on a Core-i7 2700. Furthermore, the runtimes compare favourably with ClusCo, another state-of-the-art CUDA-enabled protein structure clustering method, while achieving similar accuracy on the iTasser benchmark dataset. Our implementation has also been incorporated into the Biochemical Algorithms library to allow easy integration into biologists' workflows.
机译:根据分子系统的三维结构进行聚类是许多生物信息学工作流程中的重要一步。在对接或结构预测之类的应用中,许多算法最初会生成大量的候选姿态(或诱饵),然后将其聚类以允许对合理的代表进行后续的计算昂贵的评估。由于此类候选对象的数量很容易从数千到数百万不等,因此在标准中央处理单元(CPU)上执行群集非常耗时。在本文中,我们使用图形处理单元上基于原子的均方根偏差(RMSD)和刚体RMSD分子距离,分析和评估不同的方法以并行化最近邻居链算法以执行蛋白质结构的层次Ward聚类(GPU)。与Core-i7 2700上的多线程CPU实施相比,这使我们在GeForce Titan GPU上的CUDA实现的速度提高了大约一个数量级。此外,运行时与另一种状态的ClusCo相比具有优势。 CUDA技术的蛋白质结构聚类方法,同时在iTasser基准数据集上达到相似的准确性。我们的实施方式也已整合到生化算法库中,以便轻松集成到生物学家的工作流程中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号