首页> 外文会议>International Conference on Intelligent Systems for Molecular biology >Annotation of bacterial genomes using improved phylogenomic profiles
【24h】

Annotation of bacterial genomes using improved phylogenomic profiles

机译:使用改进的系统膜涂层的细菌基因组注释

获取原文

摘要

Motivation: Phylogenomic profiling is a large-scale comparative genomic method used to infer protein function from evolutionary information first described in a binary form by Pellegrini et al. (1999). Here, we propose improvements of this approach including the use of normalized Blastp bit scores, a normalization of the matrix of profiles to take into account the evolutionary distances between bacteria, the definition of a phylogenomic neighborhood based on continuous pairwise distances between genes and an original annotation procedure including the computation of a p-value for each functional assignment. Results: The method presented here increases the number of Ecocyc enzymes identified as being evolutionary related by about 25% with respect tothe original binary form (absent/present) method. The fraction of 'false' positives is shown to be smaller than 20%. Based on their phylogenomic relationships, genes of unknown function can then be automatically related to annotated genes. Each gene annotation predicted is associated with a p-value, i.e. its probability to be obtained by chance. The validity of this method was extensively tested on a large set of genes of known function using the MultiFun database. We find that 50% of 3122 function attributions that can be made at a p-value level of 10~(-11) correspond to the actual gene annotation. The method can be readily applied to any newly sequenced microbial genome. In contrast to earlier work on the same topic, our approach avoids the use of arbitrary cut-off values, and provides a reliability estimate of the functional predictions in form of p-values.
机译:动机:Phylogenomic分析是用来从由Pellegrini等人在二进制形式首先描述进化信息推断蛋白功能的大规模比较基因组方法。 (1999)。在这里,我们建议这种方法包括使用标准化的Blastp位得分,型材的矩阵正常化考虑到改进细菌之间的进化距离,一个phylogenomic附近的基于基因和原始之间的连续成对距离的定义注释过程包括用于每个功能的分配的p值的计算。结果:方法这里介绍的增加Ecocyc的许多酶鉴定为进化了约25%,相对于原始tothe二进制形式(不存在/存在)方法相关。 “假”阳性的级分显示​​为小于20%。根据他们的phylogenomic关系,未知功能基因可以自动地与注释的基因。到偶然获得的每个基因注释预测与p值相关联,即它的概率。这个方法的有效性在一个大组使用MultiFun数据库已知功能的基因的进行了广泛的测试。我们发现,可以在10〜(-11)对应的p值水平实际基因注释进行3122个功能归因50%。该方法可以容易地应用到任何新测序的微生物基因组。与此相反,以同一主题的早期工作,我们的方法避免了使用任意的临界值,并提供p值的形式功能预测的可靠性估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号