...
首页> 外文期刊>Genome Biology >Automatic annotation of eukaryotic genes, pseudogenes and promoters
【24h】

Automatic annotation of eukaryotic genes, pseudogenes and promoters

机译:自动注释真核基因,假基因和启动子

获取原文
获取原文并翻译 | 示例

摘要

Background: The ENCODE gene prediction workshop (EGASP) has been organized to evaluate how well state-of-the-art automatic gene finding methods are able to reproduce the manual and experimental gene annotation of the human genome. We have used Softberry gene finding software to predict genes, pseudogenes and promoters in 44 selected ENCODE sequences representing approximately 1% (30 Mb) of the human genome. Predictions of gene finding programs were evaluated in terms of their ability to reproduce theENCODE-HAVANA annotation. Results: The Fgenesh++ gene prediction pipeline can identify 91% of coding nucleotides with a specificity of 90%. Our automatic pseudogene finder (PSF program) found 90% of the manually annotated pseudogenes and some new ones.The Fprom promoter prediction program identifies 80% of TATA promoters sequences with one false positive prediction per 2,000 base-pairs (bp) and 50% of TATA-less promoters with one false positive prediction per 650 bp. It can be used to identify transcription start sites upstream of annotated coding parts of genes found by gene prediction software. Conclusions: We review our software and underlying methods for identifying these three important structural and functional genome components and discuss the accuracy of predictions, recent advances and open problems in annotating genomic sequences. We have demonstrated that our methods can be effectively used for initial automatic annotation of the eukaryotic genome.
机译:背景:组织了ENCODE基因预测研讨会(EGASP),以评估最新的自动基因发现方法能够复制人类基因组的手动和实验性基因注释的能力。我们已经使用Softberry基因发现软件来预测44种选定的ENCODE序列中的基因,假基因和启动子,这些序列代表大约1%(30 Mb)的人类基因组。根据基因再现程序对ENCODE-HAVANA注释的再现能力进行了评估。结果:Fgenesh ++基因预测管道可以识别91%的编码核苷酸,特异性为90%。我们的自动假基因发现器(PSF程序)发现了90%的人工注释的假基因和一些新的假基因.Fprom启动子预测程序可识别80%的TATA启动子序列,每2,000个碱基对(bp)有一个假阳性预测,而50%的不含TATA的启动子,每650 bp有一个假阳性预测。它可用于识别基因预测软件发现的基因注释编码部分上游的转录起始位点。结论:我们回顾了用于识别这三个重要的结构和功能基因组成分的软件和基本方法,并讨论了在注释基因组序列中的预测准确性,最新进展和未解决的问题。我们已经证明,我们的方法可以有效地用于真核生物基因组的初始自动注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号