...
首页> 外文期刊>RSC Advances >Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues
【24h】

Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues

机译:使用改进的随机森林方法预测人体肠道吸收:分子表示,不平衡数据和适用性领域问题的综合评估

获取原文
   

获取外文期刊封面封底 >>

       

摘要

With the increase of complexity and risk in drug discovery processes, human intestinal absorption (HIA) prediction has become more and more important. Up to now, some predictive models have been constructed to estimate HIA of new drug-like compounds with acceptable accuracies, but there are still some issues to be explored including the limited and unbalanced HIA data, the performance of different types of descriptors and the application domain issues of published models. To address these problems, in this study, we collected a relatively large dataset consisting of 970 compounds, and 9 different types of descriptors were calculated for further modeling. For all the modeling processes, a parameter named samplesize in the random forest (RF) method was applied to balance the dataset. And then, classification models were established based on different training sets and different combinations of descriptors. After a series of modeling processes and various comparisons among these statistical results, we explored the aforementioned problems and evaluated the reliabilities of existing HIA classification models and subsequently obtained a robust and applicable model based on a combination of 2D, 3D, N+ and Nrule-of-five (for the training set, SE = 0.892, SP = 0.846; for the test set, SE = 0.877, SP = 0.813). Compared with other published models, our model exhibits some advantages in data size, model accuracy and model practicability to some extent. This structure–activity relationship model is necessary and useful for HIA prediction and it could be a convenient tool for virtual screening in the early stage of drug development.
机译:随着药物发现过程中复杂性和风险的增加,人体肠道吸收(HIA)预测变得越来越重要。到目前为止,已经建立了一些预测模型来估计具有可接受的准确性的新药样化合物的HIA,但是仍然有一些问题需要探讨,包括有限且不平衡的HIA数据,不同类型的描述符的性能以及应用已发布模型的领域问题。为了解决这些问题,在这项研究中,我们收集了一个由970种化合物组成的相对较大的数据集,并计算了9种不同类型的描述符以进行进一步建模。对于所有建模过程,都应用了随机森林(RF)方法中名为samplesize的参数来平衡数据集。然后,基于不同的训练集和描述符的不同组合建立分类模型。经过一系列建模过程以及这些统计结果之间的各种比较之后,我们探索了上述问题并评估了现有HIA分类模型的可靠性,随后基于2D,3D,N < sup> + N 五分法则 (对于训练集,SE = 0.892,SP = 0.846;对于测试集,SE = 0.877,SP = 0.813)。与其他已发布的模型相比,我们的模型在一定程度上显示了数据大小,模型准确性和模型实用性方面的优势。这种结构-活性关系模型对于HIA预测是必要且有用的,它可能是在药物开发早期进行虚拟筛选的便捷工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号