...
首页> 外文期刊>BMC Bioinformatics >Predicting protein linkages in bacteria: Which method is best depends on task
【24h】

Predicting protein linkages in bacteria: Which method is best depends on task

机译:预测细菌中的蛋白质连接:哪种方法最好取决于任务

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. Results Using Escherichia coli K12 and Bacillus subtilis , linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis . Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. Conclusion A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.
机译:背景技术预测蛋白质功能性连接的计算方法的应用正在增加。近年来,已经开发了几种预测连接的细菌特异性方法。四种主要的基因组背景方法是:基因簇,基因邻居,罗塞塔石碑和系统发育图。这些方法已被证明是功能强大的工具,本文通过探索每种方法的不同功能以及它们的组合所提供的潜在改进,为每种方法何时合适提供了指导。我们还将回顾这些预测方法的许多先前处理方法,使用最新的可用注释,并提供许多新的观察结果。结果使用大肠杆菌K12和枯草芽孢杆菌,根据三种基准对每种方法做出的连锁预测进行了评估:三个基准:COG和KEGG定义的功能类别,EcoCyc中列出的已知途径以及RegulonDB中列出的已知操纵子。每种评估方法都有其优点和缺点,没有一种方法可以控制所研究的预测能力的所有方面。对于功能类别,如先前的研究所示,Rosetta Stone方法最擅长于检测具有共同KEGG类别的蛋白质之间的连接和预测功能,而系统发育谱方法最适合于具有共同COG功能的蛋白质之间的检测和功能预测。 COG与KEGG的性能差异可能归因于旁系同源物的存在。当使用基于可靠性的链接的加权组合比使用链接集的简单未加权联合时,观察到更好的功能预测。对于途径重建,通过至少一种方法的连锁覆盖了大肠杆菌K12中的99条完整代谢途径(209种已知的非平凡途径中的全部)和其蛋白质的50%的193条途径。基因邻居在途径重建方面最有效,可重建48条完整途径。对于操纵子预测,基因簇完全预测了大肠杆菌K12中59%的已知操纵子和枯草芽孢杆菌中88%(333/418)。比较了两个版本的大肠杆菌K12操纵子数据库,早期版本中的许多未注释预测已更新为更高版本中的真实预测。仅使用“基因簇”和“基因邻居”都发现的连锁可以提高操纵子预测的准确性。此外,如先前的研究所示,将基于基因间区域和蛋白质功能的特征结合起来可以提高操纵子预测的特异性。结论计算方法的一个常见问题是可能由于不完整的验证来源而导致大量误报的产生。通过比较数据库的两个版本,我们证明了报告结果的巨大差异。我们使用了几个基准,这些基准已显示出每种预测方法的相对有效性,并提供了有关哪种方法最适合给定预测任务的指南。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号