首页> 外文OA文献 >Extensive complementarity between gene function prediction methods
【2h】

Extensive complementarity between gene function prediction methods

机译:基因功能预测方法之间的广泛互补性

摘要

Motivation: The number of sequenced genomes rises steadily, but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesize that AFP approaches which draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. Results: Our pipeline amalgamates 5, 133, 543 genes from 2, 071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1, 227 Gene Ontology terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most- confident prediction per gene/function, instead of enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known E. coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods, or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the CAFA2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them.
机译:动机:测序的基因组数量稳步增长,但我们仍然缺乏有关许多基因的生物学作用的知识。因此,自动功能预测(AFP)是必要的。我们假设,利用不同基因组特征的AFP方法可能有助于预测不同类型的基因功能,从而激发对获得和整合此类预测所获得收益的系统分析。结果:我们的产品线在一次大规模分析中融合了2,071个基因组中的5、133、543个基因,评估了五种已建立的基因组AFP方法。尽管1,227个基因本体论术语产生了可靠的预测,但是这些功能中的大多数只能通过一种或两种方法使用。此外,不同的方法倾向于将GO术语分配给非重叠的基因集。因此,通过各种AFP方法得出的推论在基因和功能上都显示出惊人的互补性。因此,一种可行的整合策略是依靠每个基因/功能的一个最自信的预测,而不是在多种AFP方法之间强制达成一致。使用信息理论方法,我们估计当前数据库包含29.2位/已知的大肠杆菌基因功能的基因。使用单独的AFP方法,可以将其最多增加5.5位/基因,或者在集成时可以增加11位/基因,从而在CAFA2社区基准上提供高度排名的预测指标。更多测序的基因组的可用性提高了AFP方法的预测准确性,并提高了整合它们的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号