...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Issues in performance evaluation for host-pathogen protein interaction prediction
【24h】

Issues in performance evaluation for host-pathogen protein interaction prediction

机译:宿主病原蛋白相互作用预测的性能评估中的问题

获取原文
获取原文并翻译 | 示例

摘要

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein-protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host-pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.
机译:宿主与病原体蛋白之间相互作用的研究对于理解传染病的潜在机制和开发新的治疗方案非常重要。检测蛋白质-蛋白质相互作用(PPI)的湿实验室技术可以从计算预测中受益。机器学习是可以通过预测有前途的PPI来帮助生物学家的计算方法之一。文献中已经提出了许多基于机器学习的预测宿主-病原体相互作用(HPI)的方法。用于评估此类预测器准确性的技术在该领域至关重要。在本文中,我们质疑K倍交叉验证对于估计HPI预测对未知相互作用蛋白的泛化能力的有效性。 K-fold交叉验证无法模拟这种情况,并且我们证明了它的性能与另一种评估方案的性能之间的巨大差异,该评估方案称为一种病原体蛋白质遗漏(LOPO)交叉验证。 LOPO在模拟HPI预测因子在现实世界中的使用时更为有效,特别是在培训期间没有关于病原体蛋白相互作用伴侣的信息的情况下。我们还指出,当前使用的度量标准,例如精确召回或接收器工作特性曲线下的面积,对于生物学家而言并不直观,为此提出了更简单,更直接可解释的度量标准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号