首页> 外文期刊>Scientific reports. >Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
【24h】

Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II

机译:基于NSGA-II的描述符选择的基于集成HERG数据库的综合HERG数据库支持向量机模型

获取原文
           

摘要

Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches constructed the datasets from only one database, limiting the predictability and scope of the models. In this study, a hERG classification model was constructed using the largest dataset for hERG inhibition built by integrating multiple databases. The integrated dataset consisted of more than 291,000 structurally diverse compounds derived from ChEMBL, GOSTAR, PubChem, and hERGCentral. The prediction model was built by support vector machine (SVM) with descriptor selection based on Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to optimize the descriptor set for maximum prediction performance with the minimal number of descriptors. The SVM classification model using 72 selected descriptors and ECFP_4 structural fingerprints recorded kappa statistics of 0.733 and accuracy of 0.984 for the test set, substantially outperforming the prediction performance of the current commercial applications for hERG prediction. Finally, the applicability domain of the prediction model was assessed based on the molecular similarity between the training set and test set compounds.
机译:评估药物发现计划早期阶段的HERG责任很重要。最近在公共数据库中增加了HERG相关信息,使机器学习技术的各种成功应用能够预测HERG抑制。然而,大多数这些研究从一个数据库构建了数据集,限制了模型的可预测性和范围。在这项研究中,使用最大数据集来构建HERG分类模型,用于通过集成多个数据库构建的HERG抑制。集成数据集由291,000种结构各种化合物组成,源自Chembl,Gostar,Pubchem和HergCentral。通过支持向量机(SVM)构建预测模型,其基于非主导排序遗传算法-II(NSGA-II)的描述符选择来实现,以优化用于最大预测性能的描述符,以最小的描述符。 SVM分类模型使用72所选描述符和ECFP_4结构指纹记录的Kappa统计0.733的估计和测试集的精度为0.984,显着优于电流商业应用的预测性能。最后,基于训练集和试验组化合物之间的分子相似来评估预测模型的适用性域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号