Automatic annotation of eukaryotic genes, pseudogenes and promoters

Victor Solovyev; Peter Kosarev; Igor Seledsov; Denis Vorobyev

首页> 外文期刊>Genome Biology >Automatic annotation of eukaryotic genes, pseudogenes and promoters

【24h】

Automatic annotation of eukaryotic genes, pseudogenes and promoters

机译：自动注释真核基因，假基因和启动子

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: The ENCODE gene prediction workshop (EGASP) has been organized to evaluate how well state-of-the-art automatic gene finding methods are able to reproduce the manual and experimental gene annotation of the human genome. We have used Softberry gene finding software to predict genes, pseudogenes and promoters in 44 selected ENCODE sequences representing approximately 1% (30 Mb) of the human genome. Predictions of gene finding programs were evaluated in terms of their ability to reproduce theENCODE-HAVANA annotation. Results: The Fgenesh++ gene prediction pipeline can identify 91% of coding nucleotides with a specificity of 90%. Our automatic pseudogene finder (PSF program) found 90% of the manually annotated pseudogenes and some new ones.The Fprom promoter prediction program identifies 80% of TATA promoters sequences with one false positive prediction per 2,000 base-pairs (bp) and 50% of TATA-less promoters with one false positive prediction per 650 bp. It can be used to identify transcription start sites upstream of annotated coding parts of genes found by gene prediction software. Conclusions: We review our software and underlying methods for identifying these three important structural and functional genome components and discuss the accuracy of predictions, recent advances and open problems in annotating genomic sequences. We have demonstrated that our methods can be effectively used for initial automatic annotation of the eukaryotic genome.

机译：背景：组织了ENCODE基因预测研讨会（EGASP），以评估最新的自动基因发现方法能够复制人类基因组的手动和实验性基因注释的能力。我们已经使用Softberry基因发现软件来预测44种选定的ENCODE序列中的基因，假基因和启动子，这些序列代表大约1％（30 Mb）的人类基因组。根据基因再现程序对ENCODE-HAVANA注释的再现能力进行了评估。结果：Fgenesh ++基因预测管道可以识别91％的编码核苷酸，特异性为90％。我们的自动假基因发现器（PSF程序）发现了90％的人工注释的假基因和一些新的假基因.Fprom启动子预测程序可识别80％的TATA启动子序列，每2,000个碱基对（bp）有一个假阳性预测，而50％的不含TATA的启动子，每650 bp有一个假阳性预测。它可用于识别基因预测软件发现的基因注释编码部分上游的转录起始位点。结论：我们回顾了用于识别这三个重要的结构和功能基因组成分的软件和基本方法，并讨论了在注释基因组序列中的预测准确性，最新进展和未解决的问题。我们已经证明，我们的方法可以有效地用于真核生物基因组的初始自动注释。

著录项

来源
《Genome Biology 》 |2006年第1期| 共12页
作者
Victor Solovyev; Peter Kosarev; Igor Seledsov; Denis Vorobyev;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类遗传学 ;
关键词

相似文献

外文文献
中文文献
专利

1. Automatic annotation of eukaryotic genes, pseudogenes and promoters [J] . Victor Solovyev, Peter Kosarev, Igor Seledsov, Genome Biology . 2006 ,第Suppla1期

机译：自动注释真核基因，假基因和启动子
2. Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation [J] . Deyou Zheng, John E. Karro, Mark Gerstein, Nucleic acids research . 2007 ,第suppla1期

机译：pseudogene.org：伪基因注释的全面数据库和比较平台
3. Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes [J] . Nathaniel Echols, Paul Harrison, Suganthi Balasubramanian, Nucleic Acids Research . 2002 ,第11期

机译：真核基因组氨基酸和核苷酸组成的综合分析，比较基因和假基因
4. Comparison of Promoter Sequences in the Eukaryotic Ribosomal Protein Genes [C] . LI Hui-min, ZHANG Jing The 2nd International Conference on Bioinformatics and Biomedical Engineering(iCBBE 2008)(第二届生物信息与生物医学工程国际会议）论文集 . 2008

机译：真核生物核糖体蛋白基因启动子序列的比较
5. Automatic test case generation with dynamic symbolic execution for programs that are coded against interfaces and annotations or use native code [D] . Islam, Mainul. 2015

机译：自动测试用例生成具有用于编码接口和注释或使用本机代码编码的程序的动态符号执行
6. Automatic annotation of eukaryotic genes pseudogenes and promoters [O] . Victor Solovyev, Peter Kosarev, Igor Seledsov, 2006

机译：自动注释真核基因假基因和启动子
7. Automatic annotation of eukaryotic genes, pseudogenes and promoters [O] . Solovyev Victor, Kosarev Peter, Seledsov Igor, 2006

机译：自动注释真核基因，假基因和启动子

Automatic annotation of eukaryotic genes, pseudogenes and promoters

摘要

著录项

相似文献

相关主题

期刊订阅