首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >LBE: A Computational Load Balancing Algorithm for Speeding up Parallel Peptide Search in Mass-Spectrometry Based Proteomics
【24h】

LBE: A Computational Load Balancing Algorithm for Speeding up Parallel Peptide Search in Mass-Spectrometry Based Proteomics

机译:LBE:一种基于质谱的蛋白质组学方法,用于加快并行肽搜索的计算负载平衡算法

获取原文

摘要

The most commonly employed method for peptide identification in mass-spectrometry based proteomics involves comparing experimentally obtained tandem MS/MS spectra against a set of theoretical MS/MS spectra. The theoretical MS/MS spectra data are predicted using protein sequence database. Most state-of-the-art peptide search algorithms index theoretical spectra data to quickly filter-in the relevant (similar) indexed spectra when searching an experimental MS/MS spectrum. Data filtration substantially reduces the required number of computationally expensive spectrum-to-spectrum comparison operations. However, the number of predicted (and indexed) theoretical spectra grows exponentially with increase in post-translational modifications creating a memory and I/O bottleneck. In this paper, we present a parallel algorithm, called LBE, for efficient partitioning of theoretical spectra data on a distributed-memory architecture. Our proposed algorithm first groups the similar theoretical spectra. The groups are then finely split across the system allowing machines to perform almost equal amount of work when querying a MS/MS spectrum. Our results show that the compute load imbalance using LBE based data distribution is ≤ 20% allowing speedups of order of magnitudes over existing methods. The proposed algorithm has been implemented on a compute cluster using MPI library. Experimental results for increasing index sizes are reported in terms of execution time, speedups and memory footprint. To the best of our knowledge, LBE is the first load-balancing technique for MS/MS proteomics data on memory-distributed clusters that incorporates proteomics domain knowledge for efficient load-balancing. Source code is made available at: https://github.com/pcdslab/lbdslim/tree/mpi.
机译:在基于质谱的蛋白质组学中,最常用的肽段鉴定方法包括将实验获得的串联MS / MS光谱与一组理论MS / MS光谱进行比较。使用蛋白质序列数据库可预测理论的MS / MS光谱数据。大多数最新的肽搜索算法都对理论光谱数据进行索引,以在搜索实验性MS / MS光谱时快速过滤相关(相似)索引的光谱。数据过滤大大减少了所需的计算量大的频谱间比较操作的数量。但是,随着翻译后修饰的增加,预测的(和索引化的)理论光谱的数量呈指数增长,从而产生内存和I / O瓶颈。在本文中,我们提出了一种称为LBE的并行算法,用于在分布式内存体系结构上有效分割理论光谱数据。我们提出的算法首先对相似的理论光谱进行分组。然后,将这些组在整个系统中细分,使机器在查询MS / MS频谱时可以执行几乎相等的工作量。我们的结果表明,使用基于LBE的数据分布的计算负载不平衡度≤20%,与现有方法相比,可加快数量级的速度。所提出的算法已使用MPI库在计算集群上实现。报告了增加索引大小的实验结果,包括执行时间,加速和内存占用情况。据我们所知,LBE是第一种用于内存分布式群集上的MS / MS蛋白质组学数据的负载平衡技术,该技术结合了蛋白质组学领域知识以实现有效的负载平衡。源代码位于:https://github.com/pcdslab/lbdslim/tree/mpi。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号