首页> 外文期刊>PLoS Genetics >Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs
【24h】

Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

机译:FANTOM3中的转录本注释:基于物理cDNA的小鼠基因目录

获取原文
           

摘要

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
机译:国际FANTOM联盟旨在基于广泛的cDNA收集和全长富集cDNA的功能注释,来制作哺乳动物转录组的全面图片。先前的数据集FANTOM2包含60,770个全长富集cDNA。功能注释显示该cDNA数据集仅包含估计数量的小鼠蛋白质编码基因的大约一半,这表明仍有许多cDNA需要收集和鉴定。为了追求涵盖所有预测的小鼠基因的完整基因目录,自FANTOM2开始,全长富集cDNA的克隆和测序一直在继续。在FANTOM3中,对42,031个新分离的cDNA进行了功能注释,并更新了4,347个FANTOM2 cDNA的注释。为了完成准确的功能注释,我们通过引入新的编码序列预测程序改进了自动注释流程,并开发了基于Web的注释界面,以简化注释过程以减少手动注释错误。自动编码顺序和功能预测之后,由专业策展人进行人工策展和审查。总共注释了102,801个全长富集的小鼠cDNA。在102,801个转录本中,有56,722个在功能上被注释为蛋白编码(包括部分或截短的转录本),从而为我们所知,全长cDNA当前覆盖了最大的小鼠蛋白质组。不同的非蛋白质编码转录本总数增加至34,030。 FANTOM3批注系统由自动计算预测,手动管理和最终专家管理组成,有助于全面描述小鼠转录组,并可以应用于其他物种的转录组。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号