首页> 外文会议>Annual International Conference on Research in Computational Molecular Biology >Computational Identification of Evolutionary Conserved Exons
【24h】

Computational Identification of Evolutionary Conserved Exons

机译:进化保护外显子的计算识别

获取原文

摘要

Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multi-species version of the ab initio gene prediction problem. These models allow sequence divergence, a phylogeny, patterns of substitution, and base composition all to be considered simultaneously, in a single unified probabilistic model Here, we apply phylo-HMMs to a restricted version of the gene prediction problem in which individual exons are sought that are evolutionarily conserved across adiverse set of species. We discuss two new methods for improving prediction performance: (1) the use of context-dependent phylogenetic models, which capture phenomena such as a strong CpG effect in noncoding regions and a preference for synonymous ratherthan nonsynonymous substitutions in coding regions; and (2) a novel strategy for incorporating insertions and deletion (indels) into the state-transition structure of the model, which captures the different characteristic patterns of alignment gaps in coding and noncoding regions. We also discuss the technique, previously used in pairwise gene predictors, of explicitly modeling conserved noncoding sequence to help reduce false positive predictions. These methods have been incorporated into an exon prediction program called ExONlPHY, and tested with two large data sets. Experimental results indicate that all three methods produce significant improvements in prediction performance. In combination, they lead to prediction accuracy comparable to that of some of the best available gene predictors, despite several limitations of our current models.
机译:最近已经提出了系统发育隐患马洛夫模型(Phylo-HMMS)作为解决AB初始基因预测问题的多种形式的一种方法。这些模型允许序列分歧,替代,取代模式,以及在此处的单一统一概率模型中同时考虑的基础组合物,我们将Phylo-HMMS应用于所寻求个体外显子的基因预测问题的受限版本这在跨越物种组中的进化过程中。我们讨论了两种提高预测性能的新方法:(1)使用上下文相关的系统发育模型,该模型捕获了非编码区中强大的CpG效应,以及对编码区中非唯一替换的偏好。 (2)一种用于将插入和删除(Indels)掺入模型状态转换结构的新策略,其捕获了编码和非分量区域中的对准间隙的不同特性模式。我们还讨论了以前用于成对基因预测器的技术,明确建模保守的非编码序列,以帮助减少假阳性预测。这些方法已被纳入一个名为Exonlphy的外显子预测程序,并用两个大数据集进行测试。实验结果表明,所有三种方法都会产生显着的预测性能。组合,尽管我们目前的模型有几个限制,它们导致与一些最佳可用基因预测器的预测准确性相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号