...
首页> 外文期刊>Scientific reports. >Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting
【24h】

Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting

机译:使用极端梯度增强功能增强对蛋白质-蛋白质界面热点的预测

获取原文
           

摘要

Identification of hot spots, a small portion of protein-protein interface residues that contribute the majority of the binding free energy, can provide crucial information for understanding the function of proteins and studying their interactions. Based on our previous method (PredHS), we propose a new computational approach, PredHS2, that can further improve the accuracy of predicting hot spots at protein-protein interfaces. Firstly we build a new training dataset of 313 alanine-mutated interface residues extracted from 34 protein complexes. Then we generate a wide variety of 600 sequence, structure, exposure and energy features, together with Euclidean and Voronoi neighborhood properties. To remove redundant and irrelevant information, we select a set of 26 optimal features utilizing a two-step feature selection method, which consist of a minimum Redundancy Maximum Relevance (mRMR) procedure and a sequential forward selection process. Based on the selected 26 features, we use Extreme Gradient Boosting (XGBoost) to build our prediction model. Performance of our PredHS2 approach outperforms other machine learning algorithms and other state-of-the-art hot spot prediction methods on the training dataset and the independent test set (BID) respectively. Several novel features, such as solvent exposure characteristics, second structure features and disorder scores, are found to be more effective in discriminating hot spots. Moreover, the update of the training dataset and the new feature selection and classification algorithms play a vital role in improving the prediction quality.
机译:热点的鉴定是蛋白质-蛋白质界面残基的一小部分,贡献了大部分结合自由能,可为了解蛋白质的功能和研究其相互作用提供重要信息。基于我们以前的方法(PredHS),我们提出了一种新的计算方法PredHS2,它可以进一步提高预测蛋白质-蛋白质界面热点的准确性。首先,我们建立了一个新的训练数据集,其中包含从34种蛋白质复合物中提取的313个丙氨酸突变的界面残基。然后,我们生成了600种序列,结构,曝光和能量特征,以及欧几里得和Voronoi邻域特性。为了删除冗余和不相关的信息,我们使用两步特征选择方法选择了一组26个最佳特征,其中包括最小冗余最大相关性(mRMR)过程和顺序前向选择过程。基于所选的26个功能,我们使用极端梯度增强(XGBoost)来构建我们的预测模型。我们的PredHS2方法的性能优于训练数据集和独立测试集(BID)上的其他机器学习算法和其他最新热点预测方法。发现一些新颖的特征,例如溶剂暴露特征,第二结构特征和无序分数,在区分热点方面更有效。此外,训练数据集的更新以及新的特征选择和分类算法在提高预测质量方面起着至关重要的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号