首页> 外文学位 >Probabilistic computational methods for structural alignment of RNA sequences.
【24h】

Probabilistic computational methods for structural alignment of RNA sequences.

机译:RNA序列结构比对的概率计算方法。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis, the problem of structural alignment of homologous RNA sequences is addressed. The structural alignment of a given set of RNA sequences is a secondary structure for each sequence, such that the structures are similar to each other, and a sequence alignment between the sequences that is conforming with the secondary structures. A solution to this problem was proposed by Sankoff as a dynamic programming algorithm whose time and memory complexities are polynomial in the length of shortest sequence and exponential in the number of input sequences, respectively. Variants of Sankoff's method employ constraints that reduce the computation by restricting the allowed alignments or structures. In the first part of the thesis, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed and these probabilities are additively combined to obtain probabilities of co-incidence. The constraints on alignments are computed by adaptively thresholding these probabilities to determine co-incidence constraints for pruning of computations that hold with high probability. The proposed constraints are implemented into Dynalign, a free energy minimization algorithm for structural alignment. Compared with prior non-adaptive approaches, the probabilistic constraints offer a significant reduction in computation time along with a marginal increase in base pair prediction accuracy.;Next, a novel algorithm for structural alignment of two RNA sequences is presented. The similarity of the structures is imposed by matched helical regions in the structural alignnment. A matched helical region represents a conserved helix in each structure. Compared to the structural alignment models of previous methods, the matched helical regions extend the possibilities for alignment of paired nucleotides, which enables the new structural alignment space to better accommodate the structural variability within a sequence family.;A probability distribution over the space of structural alignments is proposed based on pseudo-free energy changes, that account for both stability of structures and plausibility of the sequence alignment. Three different problems are addressed for structural alignment prediction as inferences from the distribution: 1. Estimation of the maximum a posteriori (MAP) structural alignment, i.e., the structural alignment with highest probability in the space, 2. Computation of base pairing probabilities of nucleotides in each sequence, 3. Sampling the structural alignment space for analysis of modes of the distribution over structural alignments.;Finally, the problem of structure prediction for an arbitrary number of homologous sequences is addressed. To circumvent the computational complexity, the prediction is broken down into an iterative computation of base pairing probabilities for each sequence using extrinsic information for base pairing derived from other sequences. For a sequence, extrinsic information is generated by using base pairing probabilities for other sequences from the previous iteration and wrapping them on the sequence in consideration via the co-incidence probabilities for alignment between the sequences. The algorithm iteratively updates the base pairing probabilities and extrinsic information for each sequence and the final set of base pairing probabilities are utilized for prediction of structures. Compared to other multi-sequence structure prediction methods, the proposed method has higher or comparable accuracy of structure prediction and time and memory requirements scale favorably with increasing number of input sequences as compared to joint alignment and folding algorithms.
机译:在本文中,解决了同源RNA序列的结构比对问题。一组给定的RNA序列的结构比对是每个序列的二级结构,以使结构彼此相似,并且是与二级结构相符的序列之间的序列比对。 Sankoff提出了一种解决此问题的解决方案,作为一种动态编程算法,其时间和存储复杂度在最短序列的长度上是多项式,在输入序列的数量上是指数。 Sankoff方法的变体采用了约束,这些约束通过限制允许的路线或结构来减少计算量。在论文的第一部分,提出了一种新的方法,旨在基于核苷酸比对和插入后验概率建立比对约束。使用隐藏的马尔可夫模型,计算对齐和插入的后验概率,并将这些概率累加合并以获得重合概率。通过自适应地对这些概率进行阈值计算来确定对齐约束,以确定用于修剪具有高概率的计算的重合约束。提出的约束被实现到Dynalign中,这是一种用于结构对齐的自由能最小化算法。与现有的非自适应方法相比,概率约束条件显着减少了计算时间,同时碱基对预测精度也略有提高。其次,提出了一种用于两个RNA序列结构比对的新算法。结构的相似性是由结构排列中匹配的螺旋区域决定的。匹配的螺旋区域代表每个结构中的保守螺旋。与先前方法的结构比对模型相比,匹配的螺旋区扩展了配对核苷酸比对的可能性,这使新的结构比对空间能够更好地适应序列家族内的结构变异性。基于伪自由能变化提出了比对,其考虑了结构的稳定性和序列比对的合理性。根据分布推断,针对结构比对预测解决了三个不同的问题:1.估计最大后验(MAP)结构比对,即在空间中具有最高概率的结构比对; 2.计算核苷酸的碱基配对概率在每个序列中,3.对结构比对空间进行采样,以分析结构比对中的分布模式。最后,解决了任意数目的同源序列的结构预测问题。为了避免计算复杂性,使用从其他序列得出的碱基配对的外在信息,将预测分解为每个序列的碱基配对概率的迭代计算。对于序列,通过使用来自先前迭代的其他序列的碱基配对概率,并通过考虑序列间比对的共生概率将它们包装在考虑的序列上,来生成外部信息。该算法迭代地更新每个序列的碱基配对概率和外部信息,并且利用碱基配对概率的最终集合来预测结构。与其他多序列结构预测方法相比,与联合比对和折叠算法相比,所提出的方法具有更高或相当的结构预测精度,并且随着输入序列数量的增加,时间和存储需求的比例也有利地增加。

著录项

  • 作者

    Harmanci, Arif Ozgun.;

  • 作者单位

    University of Rochester.;

  • 授予单位 University of Rochester.;
  • 学科 Engineering Electronics and Electrical.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 218 p.
  • 总页数 218
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号