首页> 外文期刊>Genome research >Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and 'resurrected' pseudogenes in the mouse genome.
【24h】

Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and 'resurrected' pseudogenes in the mouse genome.

机译:gun弹枪蛋白质组学有助于在小鼠基因组中发现新的蛋白质编码基因,选择性剪接和“复活”的假基因。

获取原文
获取原文并翻译 | 示例
           

摘要

Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).
机译:蛋白质组质谱(MS)的最新进展提供了将高通量肽测序与转录模型结合起来的机会,从而可以验证,改进和鉴定新的蛋白质编码基因座。我们提出了一个新的管道,该管道整合了高度敏感且统计上稳定的肽谱与全基因组蛋白编码预测相结合,从而首次在小鼠基因组中进行了大规模基因验证和发现。通过搜索超过一千万个光谱,我们已经能够分别验证所有蛋白质编码基因,外显子和剪接边界的32%,17%和7%。此外,我们提供了有力的证据来鉴定来自53个基因的多个交替剪接的翻译,并且发现了10个全新的蛋白质编码基因,这些基因未在任何鼠标注释数据源中涵盖。一种这样的新颖的蛋白质编码基因是融合蛋白,其跨越Ins2和Igf2基因座以产生编码胰岛素II和胰岛素样生长因子2衍生的肽的转录物。我们还报告了九个经过加工的假基因,它们具有独特的肽命中,这首次证明它们不仅被转录而且被翻译,因此被复活为新的编码位点。这项工作不仅突出了MS数据在基因组注释中的重要用途,而且还提供了对基因结构和在小鼠基因组中传播的独特见解。所有这些数据随后被用于改善Vega和Ensembl基因组浏览器(http://vega.sanger.ac.uk)中可用的公共鼠标注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号