...
首页> 外文期刊>PLoS Computational Biology >Evolutionary Triplet Models of Structured RNA
【24h】

Evolutionary Triplet Models of Structured RNA

机译:结构RNA的进化三重态模型

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.
机译:祖先RNA的重建和合成是古遗传学的可行目标。这将需要新的生物信息学方法,包括用于重建替代,插入缺失和结构变化历史的强大统计框架。我们描述了一种“换能器组成”算法,用于将RNA结构进化的成对概率模型扩展到与系统发育树相关的多个序列的模型。该算法借鉴了计算语言学的形式模型以及1985年David Sankoff的原序列算法。合成算法的输出是多序列随机上下文无关文法。我们描述了动态编程算法,该算法对空循环和空分叉具有鲁棒性,可用于解析此语法。示例应用包括非编码RNA的结构比对,结构信息从实验表征的序列向其同源物的传播以及一组不同RNA的祖先结构的推断。我们将上述算法用于成对的RNA结构进化的简单模型;特别是,用于三个已知RNA结构的最大似然(ML)对齐以及一个已知系统发育和共同祖先结构推断的算法。我们将此ML算法与各种相关但更简单的技术进行了比较,包括针对简单模型的ML对齐算法,该模型省略了完整模型的各个方面,还为其中一个较简单的模型提供了后解码对齐算法。在我们的测试中,碱基对结构的整合是进行精确比对推断的最重要因素。接下来是适当使用后验解码;而模型的详细信息则最不重要。后解码启发式算法可能比精确的系统发育推理要快得多,因此,这在可能的情况下(和近似对)启发了对和对启发式算法的使用。为了获得更精确的概率推断,我们讨论了使用传感器组成对系统发育的ML(或MCMC)推断,包括使核心操作易于处理的可能方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号