首页> 外文期刊>Nucleic Acids Research >A question of size: the eukaryotic proteome and the problems in defining it
【24h】

A question of size: the eukaryotic proteome and the problems in defining it

机译:大小的问题:真核蛋白质组及其定义问题

获取原文
获取原文并翻译 | 示例
           

摘要

We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the 'current' proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences ('the orfome'). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes ('dead' genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orform).
机译:我们讨论了在确定完全测序的真核生物的蛋白质组范围(即蛋白质编码序列的总数)方面存在的问题,重点是酵母,蠕虫,果蝇和人类。 (i)基因组序列完成后六年,酵母蛋白质组的真实大小仍未确定。新的小基因仍在被发现,大量的现有注释正受到质疑,这些可疑的ORF(qORF)占“当前”蛋白质组的五分之一。我们在理想的基因组注释策略的背景下讨论这些问题,该策略将蛋白质组视为所有可能编码序列(“ orfome”)的严格定义的子集。 (ii)尽管果蝇的表观复杂性更高(更多的细胞,更复杂的生理机制,更长的寿命),但线虫蠕虫似乎具有更多的基因。为了解释这一点,我们比较了蠕虫和苍蝇的注释蛋白质组,它们与基因组注释和基因组进化问题有关。 (iii)为整个人类基因组估计的基因补体的大小出乎意料地小,引起了许多关于生物复杂性性质的争论。然而,首先,对于人类基因组,基因数量与蛋白质组大小之间的关系远非简单。我们调查了目前对人类基因数量的估计,并据此估计了人类蛋白质组大小的范围。假基因(“死”基因)队列的未知程度,加上选择性剪接的普遍性,严重阻碍了这一点的确定。 (有关酵母的更多信息,请访问http://genecensus.org/yeast/orform)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号