首页> 外文期刊>Nucleic Acids Research >Loose ends: almost one in five human genes still have unresolved coding status
【24h】

Loose ends: almost one in five human genes still have unresolved coding status

机译:松散的目的:五分之一的人类基因仍然具有未解决的编码状态

获取原文
获取原文并翻译 | 示例
           

摘要

Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.
机译:人类基因组测序后的十七年,人蛋白质组仍在修改。由Ensembl / Gencode,Refseq和Uniprotkb参考数据库列出的22个210个编码基因中的八个中的八个在三组上以不同方式注释。我们对由一个或多个手动策展人进行编码而不是由他人编码的2764个基因进行了深入的调查。来自大规模遗传变异分析的数据表明,大多数不是蛋白质的净化选择,因此不太可能代码功能蛋白质。在所有三个参考组中编码的另外的1470个基因被注释为典型的非编码基因或假生素的特征。这些潜在的非编码基因也似乎正在进行中性演化,并且具有比其他编码基因的转录成分和蛋白质证据相当较低。我们认为,三个参考数据库目前至少将人类编码基因数量估计至少2000年,使噪声复杂化并向大规模生物医学实验增加。确定哪些潜在的非编码基因不是蛋白质的代码是一种困难而最重要的任务,因为人类参考蛋白质组是大多数基本研究的基本支柱,并支持几乎所有大规模生物医学项目。

著录项

  • 来源
    《Nucleic Acids Research》 |2018年第14期|共15页
  • 作者单位

    Wellcome Trust Sanger Inst Hinxton CB10 1SA Cambs England;

    Univ Pompeu Fabra Comparat Genom Lab Inst Biol Evolut Barcelona Spain;

    MIT Comp Sci &

    Artificial Intelligence Lab 77 Massachusetts Ave Cambridge MA 02139 USA;

    Spanish Natl Canc Res Ctr Bioinformat Unit Madrid Spain;

    Barcelona Supercomp Ctr Computat Biol Life Sci Grp Barcelona Spain;

    Ctr Nacl Invest Cardiovasc Cardiovasc Prote Lab Madrid Spain;

    Ctr Nacl Invest Cardiovasc Cardiovasc Prote Lab Madrid Spain;

    Spanish Natl Canc Res Ctr Bioinformat Unit Madrid Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物化学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号