...
首页> 外文期刊>BMC Bioinformatics >High-precision high-coverage functional inference from integrated data sources
【24h】

High-precision high-coverage functional inference from integrated data sources

机译:来自集成数据源的高精度,高覆盖率功能推断

获取原文
           

摘要

Background Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation. Results We first apply this framework to Saccharomyces cerevisiae . In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Na?ve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms. Conclusion We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.
机译:从各种数据源获得的背景信息可以使用各种机器学习方法以原则方式进行组合,以提高有关蛋白质功能的知识的可靠性和范围。结果是一个加权功能链接网络(FLN),其中链接的邻居有很高的概率共享至少一个功能。但是,精度低。为了为尽可能多的蛋白质提供精确的功能注释,我们探索并提出了两步进行功能注释的框架(1)通过机器学习技术构建高覆盖度和可靠的FLN(2)制定决策规则构造的FLN以优化功能注释。结果我们首先将这种框架应用于酿酒酵母(Saccharomyces cerevisiae)。第一步,我们证明了四种常用的机器学习方法,线性SVM,线性判别分析,朴素贝叶斯和神经网络,都结合了异构数据以生成可靠的高覆盖FLN,其中链接权重更大。可以比单独使用单个数据源更准确地估计链接蛋白的功能偶联。在第二步中,对所构造的FLN上的可调整决策规则进行经验调整,发现基于最大边缘权重的注释会在高覆盖范围内产生最精确的注释。特别是在低覆盖率时,所有评估的规则都具有可比性。但是,当覆盖率超过大约50%时,它们会迅速分散。在完全覆盖的情况下,最大权重决策规则仍具有约70%的精度,而对于其他方法,精度范围从略高于30%的最高值到3%的范围。此外,还提供了一种计分方案来估计各个预测的精度。最后,对框架稳健性的测试表明,我们的框架可以成功地应用于研究较少的生物。结论我们提供了一个通用的两步功能注释框架,并表明通过数据集成构建高覆盖度和可靠的FLN,然后应用最大权重决策规则,可以实现高覆盖范围,高精度注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号