首页> 外文学位 >High performance computational biology algorithms.
【24h】

High performance computational biology algorithms.

机译:高性能计算生物学算法。

获取原文
获取原文并翻译 | 示例

摘要

Multiple Sequence s Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic analysis, and prediction of evolutionary origins. The MSA problem is considered NP-hard and known heuristics for the problem do not scale well with increasing number of sequences. On the other hand, with the advent of new breed of fast sequencing techniques it is now possible to generate thousands of sequences very quickly. For rapid sequence analysis, it is therefore desirable to develop fast MSA algorithms that scale well with the increase in the dataset size. In this dissertation, we propose a novel domain decomposition based technique to solve the multiple sequence alignment problem on multiprocessing platforms. The domain decomposition based technique, in addition to yielding better quality, gives enormous advantage in terms of execution time and memory requirements. The proposed strategy allows to decrease the time complexity of any known heuristic of O(N)x complexity by a factor of O(1/ p)x, where N is the number of sequences, x depends on the underlying heuristic approach, and p is the number of processing nodes. In particular, we propose a highly scalable algorithm, Sample-Align-D, for aligning biological sequences using Muscle system as the underlying heuristic. In this dissertation, we also develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align large number of reads from single or multiple reference genomes obtained from pyrosequencing procedure. The proposed alignment algorithm accurately aligns the erroneous reads in a short period of time. The proposed algorithms have been implemented on a cluster of workstations using MPI library. We report high quality multiple alignment of up to 0.5 million reads with our analysis suggesting that up to 10 million or more reads can be aligned using our parallel algorithm. The algorithms are shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.
机译:生物序列的多序列比对(MSA)是计算生物学中的一个基本问题,因为它在包括单倍型重建,序列同源性,系统发育分析和进化起源预测在内的广泛应用中具有至关重要的意义。 MSA问题被认为是NP难题,并且已知的启发式方法无法随着序列数量的增加很好地扩展。另一方面,随着新型快速测序技术的出现,现在可以非常快速地生成数千个序列。因此,对于快速序列分析,需要开发一种快速的MSA算法,该算法可随数据集大小的增加而很好地扩展。本文提出了一种基于域分解的新技术来解决多处理平台上的多序列比对问题。基于域分解的技术,除了产生更好的质量外,在执行时间和内存需求方面还具有巨大优势。所提出的策略允许将O(N)x复杂度的任何已知启发式算法的时间复杂度降低O(1 / p)x的因子,其中N是序列数,x取决于基础启发式方法,而p是处理节点的数量。特别是,我们提出了一种高度可扩展的算法Sample-Align-D,用于使用Muscle系统作为基础启发式方法来比对生物序列。在本文中,我们还开发了一种基于域分解的高度可扩展的并行算法,称为P-Pyro-Align,以比对从焦磷酸测序过程中获得的单个或多个参考基因组中的大量读数。提出的比对算法可在短时间内准确地比对错误的读取。所提出的算法已使用MPI库在工作站集群上实现。我们报告了多达50万次读取的高质量多重比对,而我们的分析表明,使用我们的并行算法可以比对多达1000万次读取。该算法显示出高度的可扩展性,并随着处理器数量的增加而呈现出超线性加速。

著录项

  • 作者

    Saeed, Fahad.;

  • 作者单位

    University of Illinois at Chicago.;

  • 授予单位 University of Illinois at Chicago.;
  • 学科 Biology Genetics.;Biology Bioinformatics.;Engineering Computer.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 155 p.
  • 总页数 155
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 遥感技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号