首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >PNAB: Prediction of protein-nucleic acid binding affinity using heterogeneous ensemble models
【24h】

PNAB: Prediction of protein-nucleic acid binding affinity using heterogeneous ensemble models

机译:PNAB:使用异类集成模型预测蛋白质-核酸结合亲和力

获取原文

摘要

Protein-nucleic acid interactions play critical roles in many biological processes. Quantifying the binding affinity of protein-nucleic acid complexes is helpful to the understanding of protein-nucleic acid recognition mechanism and identification of reliable binding partners. In this paper, we propose a computational approach, PNAB, which can effectively predict protein-nucleic acid binding affinity using heterogeneous ensemble models based on sequence. We build a dataset of protein-nucleic acid binding affinity that includes 103 protein-RNA complex and 100 protein-DNA complexes manually collected from related literature. We find that the binding affinity mainly depends on the structure of nucleic acid molecules. According to the type of nucleic acid associated with proteins composed of the protein-nucleic acid complex, we classify the complexes divide all the complexes into 11 categories (six classes for protein-RNA complexes and five classes for protein-DNA complexes). Then, we extract sequence features from the protein-nucleic acid complexes and build a stacking heterogeneous ensemble model based on the generated features for each category. We perform a comprehensive evaluation for the proposed method on the binding affinity dataset using leave-one-out cross-validation, and we show that PNAB achieves correlations ranging from 0.84 to 0.95 among all of the categories, which is significantly better than other typical regression methods and the pioneer protein-nucleic acid binding affinity predictor. Also, a user-friendly web server has been developed to predict the binding affinity of protein-RNA complexes. The PNAB web server is freely available at http://pnab.denglab.org/.
机译:蛋白质-核酸相互作用在许多生物学过程中起关键作用。定量蛋白质-核酸复合物的结合亲和力有助于理解蛋白质-核酸识别机制和确定可靠的结合伴侣。在本文中,我们提出了一种计算方法PNAB,它可以使用基于序列的异类集成模型有效地预测蛋白质-核酸的结合亲和力。我们建立了蛋白质-核酸结合亲和力的数据集,其中包括从相关文献中手动收集的103个蛋白质-RNA复合物和100个蛋白质-DNA复合物。我们发现结合亲和力主要取决于核酸分子的结构。根据与蛋白质-核酸复合物组成的蛋白质相关的核酸类型,我们将复合物分类,将所有复合物分为11类(蛋白质-RNA复合物为六类,蛋白质-DNA复合物为五类)。然后,我们从蛋白质-核酸复合物中提取序列特征,并基于每个类别的生成特征构建堆叠的异质集成模型。我们使用留一法交叉验证对结合亲和力数据集上的拟议方法进行了综合评估,结果表明,PNAB在所有类别之间均实现了0.84至0.95的相关性,这明显优于其他典型回归方法和先驱蛋白-核酸结合亲和力预测因子。而且,已经开发了用户友好的网络服务器以预测蛋白质-RNA复合物的结合亲和力。 PNAB Web服务器可从http://pnab.denglab.org/免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号