首页> 外文期刊>mSystems >Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage
【24h】

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage

机译:丢失并找到:重新搜索和再次评分蛋白质组学数据辅助基因组注释并提高蛋白质组覆盖率

获取原文
           

摘要

Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation. IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.
机译:原核基因组注释严重依赖于易于繁殖误差和低估基因组复杂性的自动基因注释管道。我们描述了一种优化的蛋白质研讨会,其使用核糖体分析(Ribo-SEQ)和Salmonella肠道血硫伞蛋白酶的蛋白质组学数据来鉴定未泛曲的蛋白质或替代蛋白质形式。该数据分析包括搜索Cofragmenting Peptides,并在扩展的肽 - 光谱质量特征中进行后处理,包括与预测的片段离子强度进行比较。当应用该策略时,实现了增强的蛋白质组深度,以及对未经发布的肽命中的更大置信度。我们展示了我们的管道通过重分类公共脱蚊射线数据集的一般适用性。我们的结果表明,使用可用的原核(蛋白质组)数据集的系统再分析能够有助于协助实验基础的基因组注释。开放阅读框架(ORFS)的重要性描绘导致原核集基因组注释中的持续不一致。我们证明,通过OMICS数据的高级(RE)分析,可以实现更高的蛋白质组覆盖和对未经发布的ORF的敏感性检测,这可以针对有条件的细菌基因组(RE)注释,这鉴于注释财富特别相关近年来获得的测序原核基因组。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号