【24h】

Computational identification of evolutionarily conserved exons

机译:进化保守外显子的计算鉴定

获取原文

摘要

Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multi-species version of the ab initio gene prediction problem. These models allow sequence divergence, a phylogeny, patterns of substitution, and base composition all to be considered simultaneously, in a single unified probabilistic model. Here, we apply phylo-HMMs to a restricted version of the gene prediction problem in which individual exons are sought that are evolutionarily conserved across a diverse set of species. We discuss two new methods for improving prediction performance: (1) the use of context-dependent phylogenetic models, which capture phenomena such as a strong CpG effect in noncoding regions and a preference for synonymous rather than nonsynonymous substitutions in coding regions; and (2) a novel strategy for incorporating insertions and deletion (indels) into the state-transition structure of the model, which captures the different characteristic patterns of alignment gaps incoding and noncoding regions. We also discuss the technique, previously used in pairwise gene predictors, of explicitly modeling conserved noncoding sequence to help reduce false positive predictions. These methods have been incorporated into an exon prediction program called ExoniPhy, and tested with two large data sets. Experimental results indicate that all three methods produce significant improvements in prediction performance. In combination, they lead to prediction accuracy comparable to that of some of the best available gene predictors, despite several limitations of our current models.
机译:系统发育隐马尔可夫模型(phylo-HMM)最近已被提出作为解决从头算基因预测问题的多物种版本的一种手段。这些模型允许在单个统一的概率模型中同时考虑序列差异,系统发育,取代模式和碱基组成。在这里,我们将系统HMMs应用于基因预测问题的受限版本,在该版本中,将寻求在各种物种之间进化保守的单个外显子。我们讨论了两种提高预测性能的新方法:(1)使用上下文相关的系统发育模型,该模型可捕获各种现象,例如非编码区中的强CpG效应以及编码区中对同义而非非同义替换的偏好; (2)将插入和删除(indels)结合到模型的状态转换结构中的新策略,该策略捕获了编码和非编码区域的对齐间隙的不同特征模式。我们还将讨论以前在成对基因预测子中使用的显式建模保守非编码序列的技术,以帮助减少假阳性预测。这些方法已被整合到名为 ExoniPhy 的外显子预测程序中,并已通过两个大型数据集进行了测试。实验结果表明,这三种方法都可以显着改善预测性能。结合起来,尽管我们目前的模型有一些局限性,但它们可以提供与某些最佳的基因预测器可比的预测精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号