High performance computational biology algorithms.

机译：高性能计算生物学算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multiple Sequence s Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic analysis, and prediction of evolutionary origins. The MSA problem is considered NP-hard and known heuristics for the problem do not scale well with increasing number of sequences. On the other hand, with the advent of new breed of fast sequencing techniques it is now possible to generate thousands of sequences very quickly. For rapid sequence analysis, it is therefore desirable to develop fast MSA algorithms that scale well with the increase in the dataset size. In this dissertation, we propose a novel domain decomposition based technique to solve the multiple sequence alignment problem on multiprocessing platforms. The domain decomposition based technique, in addition to yielding better quality, gives enormous advantage in terms of execution time and memory requirements. The proposed strategy allows to decrease the time complexity of any known heuristic of O(N)x complexity by a factor of O(1/ p)x, where N is the number of sequences, x depends on the underlying heuristic approach, and p is the number of processing nodes. In particular, we propose a highly scalable algorithm, Sample-Align-D, for aligning biological sequences using Muscle system as the underlying heuristic. In this dissertation, we also develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align large number of reads from single or multiple reference genomes obtained from pyrosequencing procedure. The proposed alignment algorithm accurately aligns the erroneous reads in a short period of time. The proposed algorithms have been implemented on a cluster of workstations using MPI library. We report high quality multiple alignment of up to 0.5 million reads with our analysis suggesting that up to 10 million or more reads can be aligned using our parallel algorithm. The algorithms are shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.

机译：生物序列的多序列比对（MSA）是计算生物学中的一个基本问题，因为它在包括单倍型重建，序列同源性，系统发育分析和进化起源预测在内的广泛应用中具有至关重要的意义。 MSA问题被认为是NP难题，并且已知的启发式方法无法随着序列数量的增加很好地扩展。另一方面，随着新型快速测序技术的出现，现在可以非常快速地生成数千个序列。因此，对于快速序列分析，需要开发一种快速的MSA算法，该算法可随数据集大小的增加而很好地扩展。本文提出了一种基于域分解的新技术来解决多处理平台上的多序列比对问题。基于域分解的技术，除了产生更好的质量外，在执行时间和内存需求方面还具有巨大优势。所提出的策略允许将O（N）x复杂度的任何已知启发式算法的时间复杂度降低O（1 / p）x的因子，其中N是序列数，x取决于基础启发式方法，而p是处理节点的数量。特别是，我们提出了一种高度可扩展的算法Sample-Align-D，用于使用Muscle系统作为基础启发式方法来比对生物序列。在本文中，我们还开发了一种基于域分解的高度可扩展的并行算法，称为P-Pyro-Align，以比对从焦磷酸测序过程中获得的单个或多个参考基因组中的大量读数。提出的比对算法可在短时间内准确地比对错误的读取。所提出的算法已使用MPI库在工作站集群上实现。我们报告了多达50万次读取的高质量多重比对，而我们的分析表明，使用我们的并行算法可以比对多达1000万次读取。该算法显示出高度的可扩展性，并随着处理器数量的增加而呈现出超线性加速。

著录项

作者
Saeed, Fahad.;
展开▼
作者单位

University of Illinois at Chicago.;

展开▼
授予单位 University of Illinois at Chicago.;
学科 Biology Genetics.;Biology Bioinformatics.;Engineering Computer.
学位 Ph.D.
年度 2010
页码 155 p.
总页数 155
原文格式 PDF
正文语种 eng
中图分类遥感技术;
关键词

相似文献

外文文献
中文文献
专利

1. Women are underrepresented in computational biology: An analysis of the scholarly literature in biology, computer science and computational biology [J] . Kevin S. Bonham, Melanie I. Stefan PLoS Computational Biology . 2017,第10期

机译：妇女在计算生物学中的代表性不足：生物学，计算机科学和计算生物学方面的学术文献分析
2. Women are underrepresented in computational biology: An analysis of the scholarly literature in biology, computer science and computational biology [J] . Kevin S. Bonham, Melanie I. Stefan PLoS Computational Biology . 2017,第10期

机译：妇女在计算生物学中的代表性不足：生物学，计算机科学和计算生物学方面的学术文献分析
3. Highlights from the 5 th International Society for Computational Biology Student Council Symposium at the 17 th Annual International Conference on Intelligent Systems for Molecular Biology and the 8 th European Conference on Computational Biology [J] . Thomas Abeel, Jeroen de Ridder, Lucia Peixoto BMC Bioinformatics . 2009,第SUPPLEMENTa13期

机译：第5届国际计算生物学学会学会亮点在第17届分子生物学智能系统国际会议上和第8届欧洲计算生物学会议上的第17届国际智能系统会议上
4. Applications of High Performance Computing in Bioinformatics, Computational Biology and Computational Chemistry [C] . Horacio Perez-Sanchez, Afshin Fassihi, Jose M. Cecilia, International Conference on Bioinformatics and Biomedical Engineering . 2015

机译：高性能计算在生物信息学，计算生物学和计算化学中的应用
5. High school and college biology: A multi-level model of the effects of high school biology courses on student academic performance in introductory college biology courses. [D] . Loehr, John Francis. 2005

机译：高中和大学生物学：在大学生物学入门课程中，高中生物学课程对学生学习成绩的影响的多层次模型。
6. Women are underrepresented in computational biology: An analysis of the scholarly literature in biology, computer science and computational biology [O] . Kevin S. Bonham, Melanie I. Stefan 2017

机译：妇女在计算生物学中的代表性不足：生物学，计算机科学和计算生物学方面的学术文献分析
7. The Bioinformatics Bookshelf: Teach Yourself Computational Biology? Bioinformatics: The Machine Learning Approach By Pierre Baldi and Soren Brunak Cambridge, MA: MIT Press (1998). 351 pp. $40.00; Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins Edited by Andreas D. Baxevanis and B. F. Francis Ouellette New York: Wiley-lnterscience (1998). 370 pp. $59.95; Guide to Human Genome Computing, Second Edition Edited by Martin J. Bishop San Diego, CA: Academic Press (1998). 306 pp. $69.95; Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids By Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison Cambridge: Cambridge University Press (1998). 356 pp. $34.95; Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology By Dan Gusfield Cambridge: Cambridge University Press (1997). 534 pp. $59.95; Introduction to Computational Molecular Biology By Joao Setubal and Joao Meidanis Boston: PWS Publishing (1997). 296 pp. $61.95 [O] . Pickeral Oxana K, Boguski Mark S 1999

机译：生物信息学书架：自学计算生物学吗？生物信息学：机器学习方法，作者：Pierre Baldi和Soren Brunak剑桥，麻省：麻省理工学院出版社（1998）。 351页，$ 40.00；生物信息学：由Andreas D. Baxevanis和B. F. Francis Ouellette编辑的基因和蛋白质分析实用指南纽约：Wiley-Interscience（1998）。 370页，$ 59.95；《人类基因组计算指南》，第二版，由马丁·J·毕晓普（Martin J. Bishop）编辑，加利福尼亚州圣地亚哥：学术出版社（1998）。 306页，$ 69.95；生物序列分析：蛋白质和核酸的概率模型Richard Durbin，Sean Eddy，Anders Krogh和Graeme Mitchison剑桥：剑桥大学出版社（1998年）。 356页，$ 34.95；字符串，树和序列上的算法：计算机科学和计算生物学Dan Danssfield剑桥：剑桥大学出版社（1997年）。 534页，$ 59.95； Joao Setubal和Joao Meidanis Boston撰写的《计算分子生物学概论》：PWS出版（1997）。 296羽61.95美元
8. Renormalization Group Approach to Image Processing, A New Computational Method for 3-Dimensional Shapes in Robot Vision, and the Computational Complexity of the Cooling Algorithms. [R] . Gidas, B. 1989

机译：重整化群图像处理方法，机器人视觉中三维形状的一种新计算方法，以及冷却算法的计算复杂性。

High performance computational biology algorithms.

摘要

著录项

相似文献

相关主题

期刊订阅