首页> 外文期刊>The Journal of Membrane Biology: An International Journal for Studies on the Structure, Function & Genesis of Biomembranes >Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures
【24h】

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

机译:基于机器学习的数据清洁和后滤波程序的蛋白质 - 蛋白质相互作用位点预测

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.
机译:准确预测蛋白质 - 蛋白质相互作用位点(PPI)是目前是一个热门话题,因为它已被证明是对理解疾病机制和设计药物非常有用。基于机器学习的计算方法已经广泛利用,并证明了对PPI预测有用。然而,直接应用传统的机器学习算法,这些算法通常认为不同类别的样本是平衡的,通常由于PPI预测问题中存在的严重阶级​​不平衡而导致性能差。在这项研究中,我们提出了一种新的方法,通过数据清理程序缩短了通过滤波后期的阶级不平衡的严重性来提高PPI预测性能的新方法,并通过后滤波过程减少预测的误报:第一,一种基于机器学习的数据清洁申请程序以消除那些边际目标,这可能对培训具有清晰分类边界的模型可能产生负面影响,从大多数样本中释放原始训练数据集中的阶级失衡严重程度;然后,在清洁的数据集上培训预测模型;最后,进一步用于减少潜在的假阳性预测的有效滤波过程。基准数据集上的严格交叉验证和独立验证测试证明了该方法的功效,其与现有的基于最先进的基于序列的PPI预测因子相比表现出高竞争性能,并且应该补充现有的PPI预测方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号