首页> 外文期刊>Bioinformatics >In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists
【24h】

In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists

机译:寻找小动物:改进对脊椎动物,植物,真菌和原生生物中短外显子的预测

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. Results: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. Supplementary data: http://bioinformatics.psb.ugent.be/ Contact: yvan.saeys@psb.ugent.be
机译:动机:预测DNA片段的编码潜力在基因调用和基因组注释中至关重要,可用于鉴定潜在的外显子并结合功能位点(例如剪接位点和翻译起始位点)定位其边界。区分编码序列和非编码序列的能力与编码序列的结构有关,编码序列以密码子组织,并受其偏向用法的影响。出于统计原因,序列越长,检测此密码子偏倚就越容易。但是,在许多具有许多内含子的真核基因组中,内含子和外显子可能很小,很难根据编码潜能进行区分。结果:在这里,我们提出了新颖的方法,这些方法专门针对更好地检测短序列中的编码潜力。该方法使用互补序列特征,结合识别哪些特征与区分编码序列和非编码序列有关。这些新开发的方法在不同的物种(代表四个主要的真核生物王国)上进行了评估,并与经常用于预测编码潜力的最新马尔可夫模型进行了广泛比较。从我们的分析中得出的主要结论是:(1)结合互补序列特征明显优于目前的Markov模型,以编码短序列片段中的潜在预测;(2)编码长度预测受益于特定长度的模型,而这些模型不一定是(3)比较几种物种的结果表明,尽管我们的组合方法始终表现出色,但基因组之间仍存在重要差异。补充数据:http://bioinformatics.psb.ugent.be/联系人:yvan.saeys@psb.ugent.be

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号