首页> 外文会议>International Conference on Bioinformatics and Computational Biology >Improving the Specificity of Exon Prediction Using Genomic Homology
【24h】

Improving the Specificity of Exon Prediction Using Genomic Homology

机译:基因组同源性提高外显子预测的特异性

获取原文

摘要

In the area of computational gene prediction, existing tools routinely generate large volumes of predicted exons (putative exons). One common limitation of these tools is the relatively low specificity. A statistical approach is developed that largely improves the gene prediction specificity. The key idea is to utilize the evolutionary conservation principle relative to the putative exons. By first exploiting the homology between genomes of two related species, a probability model for the evolutionary conservation pattern across different genomes is developed. A probability model for the dependency between adjacent codons/triplets is added to differentiate exons and random sequences. Finally, the log odds ratio is developed to classify putative exons into the group of coding exons and the group of non-coding regions. When tested on pre-aligned human-mouse sequences where the putative exons are predicted by GENSCAN and TWINSCAN, the proposed method is able to improve the exon specificity by 73% and 32% respectively, while the loss of the sensitivity ≤ 1%. The method also keeps 98% of RefSeq gene structures that are correctly predicted by TWINSCAN when removing 26% of predicted genes that are in non-coding regions. The estimated number of true exons in TWINSCAN's predictions is 157,070. The results and the executable codes can be downloaded from http://www.stat.purdue.edu/~jingwu/codon/ The proposed method demonstrates an application of the evolutionary conservation principle which can be used as an additional criteria to refine many existing gene predictions.
机译:在计算基因预测领域,现有工具常规地产生大量的预测外显子(推定外显子)。这些工具的一个常见限制是比较低的特异性。开发了一种统计方法,其在很大程度上提高了基因预测特异性。关键的想法是利用进化保护原理相对于推定的外显子。通过首先利用两个相关物种的基因组之间的同源性,开发了跨不同基因组的进化节约模式的概率模型。添加相邻密码子/三元组之间依赖性的概率模型以区分外显子和随机序列。最后,开发了日志赔率比以将推定的外显子分类为编码外显子组和非编码区域组。当通过Genscan和Twinscan预测推定外显子预测的预先对准的人鼠序列测试时,所提出的方法能够分别将外显子特异性提高73%和32%,而敏感性损失≤1%。该方法还保留98%的Refseq基因结构,当去除在非编码区中的26%的预测基因时,Twinscan正确预测。 Twinscan预测中估计的真正外显子数为157,070。结果和可执行代码可以从http://www.stat.purdue.edu/~jingwu/codon/上下载,所提出的方法演示了进化保护原理的应用,这些原则可以用作优化许多现有的额外标准基因预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号