首页> 外文期刊>Molecular pharmaceutics >Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening
【24h】

Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening

机译:基于两步法特征选择的预测P-糖蛋白抑制剂的计算机模型的建立及其在中药筛选中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

P-glycoprotein (P-gp) is regarded as an important factor in determining the ADMET (absorption, distribution, metabolism, elimination, and toxicity) characteristics of drugs and drug candidates. Successful prediction of P-gp inhibitors can thus lead to an improved understanding of the underlying mechanisms of both changes in the pharmacokinetics of drugs and drug drug interactions. Therefore, there has been considerable interest in the development of in silico modeling of P-gp inhibitors in recent years. Considering that a large number of molecular descriptors are used to characterize diverse structural moleculars, efficient feature selection methods are required to extract the most informative predictors. In this work, we constructed an extensive available data set of 2428 molecules that includes 1518 P-gp inhibitors and 910 P-gp noninhibitors from multiple resources. Importantly, a two-step feature selection approach based on a genetic algorithm and a greedy forward-searching algorithm was employed to select the minimum set of the most informative descriptors that contribute to the prediction of P-gp inhibitors. To determine the best machine learning algorithm, 18 classifiers coupled with the feature selection method were compared. The top three best-performing models (flexible discriminant analysis, support vector machine, and random forest) and their ensemble model using respectively only 3, 9, 7, and 14 descriptors achieve an overall accuracy of 83.2%-86.7% for the training set containing 1040 compounds, an overall accuracy of 82.3%-85.5% for the test set containing 1039 compounds, and a prediction accuracy of 77.4%-79.9% for the external validation set containing 349 compounds. The models were further extensively validated by DrugBank database (1890 compounds). The proposed models are competitive with and in some cases better than other published models in terms of prediction accuracy and minimum number of descriptors. Applicability domain then was addressed by developing an ensemble classification model to obtain more reliable predictions. Finally, we employed these models as a virtual screening tool for identifying potential P-gp inhibitors in Traditional Chinese Medicine Systems Pharmacology (TCMSP) database containing a total of 13 051 unique compounds from 498 herbs, resulting in 875 potential P-gp inhibitors and 15 inhibitor-rich herbs. These predictions were partly supported by a literature search and are valuable not only to develop novel P-gp inhibitors from TCM in the early stages of drug development, but also to optimize the use of herbal remedies.
机译:P-糖蛋白(P-gp)被认为是确定药物和候选药物的ADMET(吸收,分布,代谢,消除和毒性)特征的重要因素。因此,成功预测P-gp抑制剂可以提高对药物药代动力学变化和药物相互作用的潜在机制的了解。因此,近年来对P-gp抑制剂的计算机模拟的开发引起了极大的兴趣。考虑到大量的分子描述符用于表征各种结构分子,因此需要有效的特征选择方法来提取信息最多的预测因子。在这项工作中,我们构建了2428个分子的广泛可用数据集,其中包括来自多种资源的1518 P-gp抑制剂和910 P-gp非抑制剂。重要的是,采用了基于遗传算法和贪婪前向搜索算法的两步特征选择方法,以选择有助于预测P-gp抑制剂的最小量信息量最大的描述符。为了确定最佳的机器学习算法,比较了18个分类器和特征选择方法。表现最佳的前三个模型(灵活判别分析,支持向量机和随机森林)及其集成模型分别仅使用3、9、7和14个描述符,它们对训练集的整体准确性为83.2%-86.7%包含1040种化合物的测试集的总体准确度为82.3%-85.5%,而包含349种化合物的外部验证集的预测精度为77.4%-79.9%。该模型进一步通过DrugBank数据库(1890种化合物)进行了验证。在预测准确度和最小描述符数量方面,所提出的模型与其他已发布的模型相比具有竞争力,并且在某些情况下要优于其他模型。然后,通过开发集成分类模型来解决适用性领域,以获得更可靠的预测。最后,我们将这些模型用作虚拟筛选工具,以在中药系统药理学(TCMSP)数据库中鉴定潜在的P-gp抑制剂,该数据库包含来自498种草药的13 051种独特化合物,产生了875种潜在的P-gp抑制剂和15种富含抑制剂的草药。这些预测得到了文献检索的部分支持,不仅对于在药物开发的早期阶段从中药开发新型P-gp抑制剂具有重要意义,而且对于优化草药的使用也具有重要意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号