首页> 外文期刊>Nature reviews neuroscience >Toward completion of the Earth's proteome: an update a decade later
【24h】

Toward completion of the Earth's proteome: an update a decade later

机译:完成地球蛋白质组的完成:十年后的更新

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on increasing the level of annotation of its sequences, its counterpart TrEMBL, much less limited by curation steps, is still in a phase of accelerated growth. However, we predict that before 2020, almost all entries deposited in UniProtKB will be homologous to known proteins. We propose that new sequencing projects can be made more useful if they are driven to sequencing voids, parts of the tree of life far from already sequenced species or model organisms. We show these voids are present in the Archaea and Eukarya domains of life. The approach to the certainty of the redundancy of new protein sequence entries leads to the consideration that most of the protein diversity on Earth has already been described, which we estimate to be of around 3.75 million proteins, revising down the prediction we did a decade ago.
机译:蛋白质数据库通过新的更高效测序技术的传播而稳步增长。这种增长是由冗余(具有各种序列相似度的同源蛋白质)的增加,以及通过在创造时尽快加工和巩固序列条目。要了解可能因蛋白质序列数据库的增加而受到损害的这些趋势和援助生物信息资源,我们创建了一种较少冗余的蛋白质数据集。并行地,我们在大小和冗余方面分析了蛋白质序列数据库的演变。虽然Swissprot数据库大多数情况下减速了其生长,但由于重点增加其序列的注释水平,其对应于TREMBL,其对应于策择步骤的限制,仍处于加速增长的阶段。然而,我们预测到2020年之前,几乎所有沉积在Uniprotkb中的条目将与已知蛋白质同源。我们建议,如果它们被驱动到排序空隙,寿命树的部分远离已经测序的物种或模型生物,则可以更有用。我们展示了这些空隙存在于古老的生命中和真核节域中。新蛋白质序列条目冗余的澄清方法导致考虑到地球上大多数蛋白质多样性已经描述,我们估计约为375万蛋白,修改了我们十年前的预测。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号