...
首页> 外文期刊>BMC Bioinformatics >Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach
【24h】

Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach

机译:使用集合方法计算蛋白质-DNA绑定界面中的热点

获取原文
           

摘要

BACKGROUND:Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods.RESULTS:Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones.CONCLUSIONS:PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .
机译:背景技术:蛋白质-DNA相互作用控制大量细胞过程,并且可以通过一小部分的界面残留物,即所谓的热点,这占大多数界面的绑定能量。准确预测热点对于了解蛋白质-DNA相互作用的原理至关重要。已经有一些计算方法可以准确和有效地预测大量的热残留物。然而,在蛋白质DNA复合物中的实验验证的热点残留物的不足和所用特征的低多样性限制了现有方法的性能。结果:在这里,我们报告了一种新的计算方法,以有效地预测蛋白质DNA中的热点绑定接口。该方法称为预热(预测热点的缩写),采用集合堆叠分类器,该分类器集成了不同的机器学习分类器,以生成具有由顺序向后特征选择算法选择的19个特征的鲁棒模型。为此,我们构建了两个新的和可靠的数据集(用于模型训练的一个基准,一个基准,用于验证的一个独立数据集),其完全由来自89个蛋白质DNA复合物的123个热点和137个非热点组成。使用严格的冗余删除过程从文献和现有数据库手动收集数据。我们的方法在基准数据集上实现0.813的灵敏度为0.813和0.868的AUC分数,并且在独立测试数据集中的0.818的灵敏度为0.818和0.820的AUC分数。结果表明,我们的方法优于现有的.Conclusions:Prehots基于升降算法的堆栈集合,可以在大规模上可靠地预测蛋白质DNA结合界面处的热点。与现有方法相比,预热可以实现更好的预测性能。所有预热和数据集的Web服务器都可以自由获取:http://dmb.tongji.edu.cn/tools/prehots/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号