...
首页> 外文期刊>RSC Advances >A machine learning approach towards the prediction of protein-ligand binding affinity based on fundamental molecular properties
【24h】

A machine learning approach towards the prediction of protein-ligand binding affinity based on fundamental molecular properties

机译:基于基础分子特性的蛋白质 - 配体结合亲和力预测的机器学习方法

获取原文
获取原文并翻译 | 示例

摘要

There is an exigency of transformation of the enormous amount of biological data available in various forms into some significant knowledge. We have tried to implement Machine Learning (ML) algorithm models on the protein-ligand binding affinity data already available to predict the binding affinity of the unknown. ML methods are appreciably faster and cheaper as compared to traditional experimental methods or computational scoring approaches. The prerequisites of this prediction are sufficient and unbiased features of training data and a prediction model which can fit the data well. In our study, we have applied Random forest and Gaussian process regression algorithms from the Weka package on protein-ligand binding affinity, which encompasses protein and ligand binding information from PdbBind database. The models are trained on the basis of selective fundamental information of both proteins and ligand, which can be effortlessly fetched from online databases or can be calculated with the availability of structure. The assessment of the models was made on the basis of correlation coefficient (R-2) and root mean square error (RMSE). The Random forest model gave R-2 and RMSE of 0.76 and 1.31 respectively. We have also used our features and prediction models on the dataset used by others and found that our model with our features outperformed the existing ones.
机译:在一些重要知识中,各种形式可用的巨大生物数据的转化存在匮乏。我们已经尝试在已经可用的蛋白质 - 配体结合亲和力数据上实施机器学习(ML)算法模型以预测未知的结合亲和力。与传统的实验方法或计算评分方法相比,ML方法比较迅速和更便宜。该预测的先决条件是训练数据的足够且不偏不倚的特征,以及可以符合数据的预测模型。在我们的研究中,我们已经从Weka包装上应用了随机森林和高斯过程回归算法,蛋白质 - 配体结合亲和力,其包括来自PDBBind数据库的蛋白质和配体绑定信息。该模型是基于蛋白质和配体的选择性基本信息培训,这可以从在线数据库中毫不费力地获取,或者可以通过结构的可用性来计算。基于相关系数(R-2)和根均方误差(RMSE)进行模型的评估。随机森林模型分别给出了R-2和0.76和1.31的RMSE。我们还在其他人使用的数据集上使用了我们的特征和预测模型,并发现我们的功能与我们的功能表现优于现有的模型。

著录项

  • 来源
    《RSC Advances 》 |2018年第22期| 共11页
  • 作者单位

    Maulana Abul Kalam Azad Univ Technol Dept Bioinformat Kolkata India;

    Indian Stat Inst Kolkata India;

    Maulana Abul Kalam Azad Univ Technol Kolkata India;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 化学 ;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号