首页> 外文期刊>Frontiers in Cell and Developmental Biology >BOW-GBDT: A GBDT Classifier Combining With Artificial Neural Network for Identifying GPCR–Drug Interaction Based on Wordbook Learning From Sequences
【24h】

BOW-GBDT: A GBDT Classifier Combining With Artificial Neural Network for Identifying GPCR–Drug Interaction Based on Wordbook Learning From Sequences

机译:Bow-GBDT:基于序列中的字母学习的识别GPCR - 药物交互的GBDT分类器结合

获取原文
           

摘要

Background: As a class of membrane protein receptors, G Protein-Coupled Receptors (GPCRs) are very important for cells to complete normal life function and have been proved to be a major drug target for widespread clinical application. Thence, it's of great significance to find GPCRs targets that interact with drugs in the process of drug development. However, identifying the interaction of the GPCR-drug pairs by experimental methods is very expensive and time-consuming on a large scale. As more and more database about GPCR-drug pairs are opened, it's viable to develop machine learning models to accurately predict whether there is an interaction existing in a GPCR-drug pair. Methods: In this paper, the proposed model is aim to improve the accuracy of predicting the interactions of GPCR-drug pairs. For GPCRs, the work extracts protein sequence features based on a novel bag-of-words (Bow) model improved with weighted Silhouette Coefficient, and has been confirmed that it can extract more pattern information and limit the dimension of feature. For drug molecules, Discrete Wavelet Transform (DWT) is used to extract feature from the original molecular fingerprints. Subsequently, the above two types of features are contacted and SMOTE algorithm is selected to balance the training dataset. And then, Artificial Neural Network (ANN) is used to extract features further. Finally, a Gradient Boosting Decision Tree (GBDT) model is trained with the selected features. In this paper, the proposed model is named as BOW-GBDT. Results: D92M and Check390 are selected for testing BOW-GBDT. D92M is used for cross validation dataset which contains 635 interactive GPCR-drug pairs and 1225 non-interactive pairs. Check390 is used for independent test dataset which consist of 130 interactive GPCR-drug pairs and 260 non-interactive GPCR-drug pairs and each element in Check390 can't be found in D92M. According to the results, the proposed model has a better performance in generation ability compared with the existing machine learning models. Conclusion: The proposed predictor improves the accuracy of the interactions of GPCR-drug pairs. In order to facilitate more researchers to use the BOW-GBDT, the predictor has been settled into a brand-new server, which is available at http://www.jci-bioinfo.cn/bowgbdt.
机译:背景:作为一类膜蛋白受体,G蛋白偶联受体(GPCR)对于细胞来完成正常寿命,并且已被证明是广泛临床应用的主要药物目标。因此,对于在药物开发过程中,找到与药物互动的GPCRS目标具有重要意义。然而,通过实验方法鉴定GPCR-TABL-对的相互作用非常昂贵且大规模耗时。随着有关GPCR - 药物对的数据库,开发机器学习模型是可行的,以准确预测是否存在GPCR - 药物对中存在的相互作用。方法:在本文中,提出的模型旨在提高预测GPCR药物对相互作用的准确性。对于GPCRS,工作提取基于具有加权轮廓系数的新颖袋式(弓)模型提取蛋白质序列特征,并且已经证实可以提取更多的模式信息并限制特征的尺寸。对于药物分子,离散小波变换(DWT)用于从原始分子指纹中提取特征。随后,联系了上述两种类型的特征,并选择了Smote算法以平衡训练数据集。然后,人工神经网络(ANN)用于进一步提取特征。最后,梯度升压决策树(GBDT)模型培训了所选功能。在本文中,所提出的模型被命名为Bow-GBDT。结果:选择D92M和Check390用于测试Bow-GBDT。 D92M用于交叉验证数据集,其包含635个交互式GPCR-毒品对和1225个非交互式对。 CHECK390用于独立的测试数据集,该数据集由130个交互式GPCR - 药物对组成,260个非交互式GPCR - 药物对,CHECK390中的每个元素不能在D92M中找到。根据结果​​,与现有机器学习模型相比,所提出的模型具有更好的发电能力性能。结论:提出的预测因素提高了GPCR - 药物对相互作用的准确性。为了方便更多的研究人员使用Bow-GBDT,预测器已被定居到一个全新的服务器中,可在http://www.jci-bioinfo.cn/bowgbdt提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号