首页> 外文OA文献 >Finding conserved patterns in biological sequences, networks and genomes
【2h】

Finding conserved patterns in biological sequences, networks and genomes

机译:寻找生物序列,网络和基因组中的保守模式

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Biological patterns are widely used for identifying biologically interesting regionswithin macromolecules, classifying biological objects, predicting functions and studyingevolution. Good pattern finding algorithms will help biologists to formulate andvalidate hypotheses in an attempt to obtain important insights into the complexmechanisms of living things.In this dissertation, we aim to improve and develop algorithms for five biologicalpattern finding problems. For the multiple sequence alignment problem, we proposean alternative formulation in which a final alignment is obtained by preserving pairwisealignments specified by edges of a given tree. In contrast with traditional NPhardformulations, our preserving alignment formulation can be solved in polynomialtime without using a heuristic, while having very good accuracy.For the path matching problem, we take advantage of the linearity of the querypath to reduce the problem to finding a longest weighted path in a directed acyclicgraph. We can find k paths with top scores in a network from the query path inpolynomial time. As many biological pathways are not linear, our graph matchingapproach allows a non-linear graph query to be given. Our graph matching formulationovercomes the common weakness of previous approaches that there is noguarantee on the quality of the results.For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates thatallow direct comparisons of clusters of different sizes. We explore both a restrictedversion which requires that orthologous genes are strictly ordered within each cluster,and the unrestricted problem that allows paralogous genes within a genome and clustersthat may not appear in every genome. We solve the first problem in polynomialtime and develop practical exact algorithms for the second one.In the gene cluster querying problem, based on a querying strategy, we proposean efficient approach for investigating clustering of related genes across multiplegenomes for a given gene cluster. By analyzing gene clustering in 400 bacterialgenomes, we show that our algorithm is efficient enough to study gene clusters acrosshundreds of genomes.
机译:生物学模式被广泛用于识别大分子中生物学感兴趣的区域,对生物学对象进行分类,预测功能和研究进化。良好的模式发现算法将有助于生物学家制定和验证假设,以期对生物的复杂机制有重要的认识。本文旨在改进和发展针对五个生物模式发现问题的算法。对于多序列比对问题,我们提出了一种替代方案,其中通过保留给定树的边缘指定的成对比对来获得最终比对。与传统的NP硬公式相比,我们的保留对齐方式可以在多项式时间内求解,而无需使用启发式算法,同时具有非常好的准确性。对于路径匹配问题,我们利用查询路径的线性度将问题减少到寻找最长加权的问题有向无环图中的路径。我们可以从查询路径多项式时间中找到网络中得分最高的k条路径。由于许多生物途径不是线性的,因此我们的图匹配方法允许给出非线性图查询。我们的图匹配公式克服了先前方法的共同缺点,即无法保证结果的质量。对于基因簇发现问题,我们研究了一种基于约束簇的整体大小的公式,并开发了统计显着性估计值,可以直接比较不同大小的簇。我们探讨了要求直系同源基因在每个簇中严格排序的限制性版本,以及允许基因组内的旁系同源基因和可能不在每个基因组中出现的簇的非限制性问题。我们解决了多项式中的第一个问题,并为第二个问题开发了实用的精确算法。在基因簇查询问题中,基于一种查询策略,我们提出了一种用于研究给定基因簇中跨多个基因组的相关基因聚类的有效方法。通过分析400个细菌基因组中的基因簇,我们证明了我们的算法足以研究数百个基因组的基因簇。

著录项

  • 作者

    Yang Qingwu;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 en_US
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号