首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Protein-protein interaction site prediction using random forest proximity distance
【24h】

Protein-protein interaction site prediction using random forest proximity distance

机译:使用随机森林接近距离的蛋白质 - 蛋白质相互作用位点预测

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

A front-end method based on random forest proximity distance (PD) is used to screen the test set to improve protein-protein interaction site (PPIS) prediction. The assessment of a distance metric is done under the assumption that a distance definition of higher quality leads to higher classification. On an independent test set, the numerical analysis based on statistical inference shows that the PD has the advantage over Mahalanobis and Cosine distance. Based on the fact that the proximity distance depends on the tree composition of the random forest model, an iterative method is designed to optimize the proximity distance, which adjusts the tree composition of the random forest model by adjusting the size of the training set. Two PD metrics, 75PD and 50PD, are obtained by the iterative method. On two independent test sets, compared with the PD produced by the original training set, the values of 75PD in Matthews correlation coefficient and F-1 score were higher, and the differences between them were statistically significant. All numerical experiments show that the closer the distance between the test data and the training data, the better the prediction results of the predictor. These indicate that the iterative method can optimize proximity distance definition and the distance information provided by PD can be used to indicate the reliability of prediction results.
机译:采用基于随机森林邻近距离(PD)的前端方法筛选测试集,以改进蛋白质相互作用位点(PPIS)预测。距离度量的评估是在这样的假设下进行的,即更高质量的距离定义会导致更高的分类。在一个独立的测试集上,基于统计推断的数值分析表明,PD优于马氏距离和余弦距离。基于邻近距离取决于随机森林模型的树组成这一事实,设计了一种迭代方法来优化邻近距离,该方法通过调整训练集的大小来调整随机森林模型的树组成。通过迭代方法得到了75PD和50PD两个PD度量。在两个独立的测试集上,与原始训练集产生的PD相比,Matthews相关系数和F-1得分中的75PD值更高,并且它们之间的差异具有统计学意义。所有的数值实验都表明,测试数据和训练数据之间的距离越近,预测器的预测效果越好。这表明迭代法可以优化邻近距离定义,PD提供的距离信息可以用来表示预测结果的可靠性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号