首页> 外文期刊>Journal of Parallel and Distributed Computing >On the design of high-performance algorithms for aligning multiple protein sequences on mesh-based multiprocessor architectures
【24h】

On the design of high-performance algorithms for aligning multiple protein sequences on mesh-based multiprocessor architectures

机译:基于网格的多处理器体系结构上比对多个蛋白质序列的高性能算法的设计

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we address the problem of multiple sequence alignment (MSA) for handling very large number of proteins sequences on mesh-based multiprocessor architectures. As the problem has been conclusively shown to be computationally complex, we employ divisible load paradigm (also, referred to as divisible load theory, DLT) to handle such large number of sequences. We design an efficient computational engine that is capable of conducting MSAs by exploiting the underlying parallelism embedded in the computational steps of multiple sequence algorithms. Specifically, we consider the standard Smith-Waterman (SW) algorithm in our implementation, however, our approach is by no means restrictive to SW class of algorithms alone. The treatment used in this paper is generic to a class of similar dynamic programming problems. Our approach is recursive in the sense that the quality of solutions can be refined continuously till an acceptable level of quality is achieved. After first phase of computation, we design a heuristic scheme that renders the final solution for MSA. We conduct rigorous simulation experiments using several hundreds of homologous protein sequences derived from the Rattus Norvegicus and Mus Musculus databases of olfactory receptors. We quantify the performance based on speed-up metric. We compare our algorithms to serial or single machine processing approaches. We testify our findings by comparing with conventional equal load partitioning (ELP) strategy that is commonly used in the parallel processing literature. Based on our extensive simulation study, we observe that DLT paradigm offers an excellent speed-up characteristics and provides avenues for its use in several other biological sequence processing related problem. This study is a first time attempt in using the DLT paradigm to devise efficient strategies to handle large scale multiple protein sequence alignment problem on mesh-based multiprocessor systems.
机译:在本文中,我们解决了在基于网格的多处理器体系结构上处理大量蛋白质序列的多重序列比对(MSA)问题。由于该问题已被确定性地证明是计算复杂的,因此我们采用了可分负载范例(也称为可分负载理论,DLT)来处理如此大量的序列。我们设计了一种有效的计算引擎,该引擎能够通过利用嵌入在多序列算法的计算步骤中的底层并行性来进行MSA。具体而言,我们在实现中考虑了标准的Smith-Waterman(SW)算法,但是,我们的方法绝不仅限于SW类算法。本文使用的处理方法是针对一类类似的动态编程问题的通用方法。我们的方法是递归的,因为可以不断完善解决方案的质量,直到达到可接受的质量水平。在计算的第一阶段之后,我们设计了一种启发式方案,该方案为MSA提供了最终解决方案。我们进行了严格的模拟实验,使用了数百种同源基因序列,这些序列均来自于Rattus Norvegicus和Mus Musculus嗅觉受体数据库。我们根据加速指标来量化效果。我们将算法与串行或单机处理方法进行比较。我们通过与并行处理文献中常用的常规等负载分配(ELP)策略进行比较来证明我们的发现。基于我们广泛的仿真研究,我们观察到DLT范式提供了出色的加速特性,并为其在其他几个生物序列处理相关问题中的使用提供了途径。这项研究是首次尝试使用DLT范式设计有效的策略来处理基于网格的多处理器系统上的大规模多蛋白序列比对问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号