首页> 外文学位 >Algorithms for reliable computing in molecular biology.
【24h】

Algorithms for reliable computing in molecular biology.

机译:分子生物学中可靠计算的算法。

获取原文
获取原文并翻译 | 示例

摘要

Most biological systems are composed of interacting constituents which collectively determine the behavior of the system. Graph theory provides an appropriate language for a quantitative description of such systems. In this thesis we study two examples of physical processes on biological networks - evolving amino acid sequences and self-assembling macromolecules. Both these processes are commonly modeled using appropriately defined networks. However, these models are often computationally intractable using standard algorithms and quantitative investigations rely almost exclusively on heuristics that do not provide guarantees on their accuracy. This thesis presents techniques with reliable error guarantees for computations on both these systems. In most non-trivial instances these methods are orders of magnitude more efficient than existing exact methods and are often comparable in run times to popular heuristics. A common theme behind these methods is the recognition that for most instances the 'hardness' of the computational problem is limited to subgraphs of the full network and that by a process of isolating and solving the problem on these hard regions, we can piece together a complete solution to the original problem. Clearly, the efficiency of any such program depends on the relative size of these computationally intractable islands and how precisely we can identify them. This thesis attempts to answer some of these questions for the following computations For the problem of computing the weighted most parsimonious phylogeny on multistate characters, we present a provably optimal method which relies on extending the standard Buneman pruning from binary character to characters with arbitrary finite number of states. We use an integer linear program to solve the maximum parsimony problem on this pruned subgraph on large data sets in time competitive with popular heuristics. Next, we present a novel optimization based Markov chain to sample phylogenies. We prove that at sufficiently high temperatures (a measure for the roughness of the probability landscape) our method is guaranteed to give well mixed samples from the likelihood distribution in steps that grow polynomially with the number of input taxa and the sequence length. Finally, we present two algorithms for simulating general continuous time Markov models, which include models used in simulating self-assembly in biological systems. These algorithms also rely on identifying intractable subgraphs by a method we call Automated Discovery. We then solve the Markov model exactly on these 'stiff' sub-graphs using complete spectral decomposition. We also propose a method with arbitrary user-defined accuracy which does not rely on a full spectral decomposition. We empirically validate the efficiency of these methods for some popular networks used in the literature.
机译:大多数生物系统由相互作用的成分组成,这些成分共同决定系统的行为。图论为此类系统的定量描述提供了一种合适的语言。在本文中,我们研究了生物网络上物理过程的两个例子-进化的氨基酸序列和自组装大分子。通常使用适当定义的网络对这两个过程进行建模。但是,这些模型通常使用标准算法在计算上难以处理,并且定量研究几乎完全依赖于不能保证其准确性的试探法。本文提出了在两个系统上都具有可靠的误差保证技术的计算技术。在大多数非平凡实例中,这些方法的效率要比现有的精确方法高几个数量级,并且在运行时间上通常可以与流行的启发式方法相提并论。这些方法背后的一个共同主题是认识到,在大多数情况下,计算问题的“硬度”仅限于整个网络的子图,并且通过在这些困难区域上隔离和解决问题的过程,我们可以将完整解决原始问题。显然,任何此类程序的效率都取决于这些难于计算的孤岛的相对大小,以及我们能够如何精确地识别它们。本文试图为以下计算回答一些问题对于计算多状态字符的加权最简约系统发育问题,我们提出了一种可证明的最佳方法,该方法依赖于将标准Buneman修剪从二进制字符扩展到具有任意有限数的字符状态。我们使用整数线性程序来解决此修剪后的子图上大数据集上的最大简约问题,从而与流行的启发式算法在时间上竞争。接下来,我们提出一种基于马尔可夫链的新型优化样本系统发育。我们证明,在足够高的温度下(一种衡量概率分布图粗糙度的方法),我们的方法可以保证从似然分布中逐步混合出样本,并随着输入分类单元的数量和序列长度的增长呈多项式增长。最后,我们提出了两种用于模拟通用连续时间马尔可夫模型的算法,其中包括用于模拟生物系统中自组装的模型。这些算法还依靠我们称为自动发现的方法来识别难处理的子图。然后,我们使用完整的频谱分解,在这些“刚度”子图上精确求解Markov模型。我们还提出了一种具有任意用户定义精度的方法,该方法不依赖于完整的频谱分解。我们经验地验证了这些方法对于文献中使用的一些流行网络的效率。

著录项

  • 作者

    Misra, Navodit.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 93 p.
  • 总页数 93
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号