...
首页> 外文期刊>Journal of chemical information and modeling >GA(M)E-QSAR: A novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design
【24h】

GA(M)E-QSAR: A novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design

机译:GA(M)E-QSAR:基于配体的药物设计中的二元分类的新型全自动遗传算法(元)集成方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
机译:计算机辅助药物设计已成为药物发现过程的重要组成部分。尽管在这一领域取得了进步,但没有一种独特的建模方法可以成功地解决QSAR建模过程中面临的所有问题。特征选择和整体建模是基于配体的药物设计研究的活跃领域。在这里,我们介绍了GA(M)E-QSAR算法,该算法结合了遗传算法的搜索和优化功能以及基于Adaboost集成的分类算法的简单性,以解决二进制分类问题。我们还探索了用Adaboost和投票方案训练的元集成的有用性,以进一步提高从遗传算法优化中得出的最佳Adaboost单个集成的准确性,泛化性和鲁棒性。我们使用来自文献的五个数据集评估了我们算法的性能,发现该算法能够产生与这些数据集相似或更好的分类结果,相对于整个活性物质子集,活性化合物的富集度更高。仅考虑最活跃的化学物质。更重要的是,我们将我们的方法与最新的特征选择和分类方法进行了比较,发现它可以提供高度准确,健壮和可推广的模型。对于从遗传算法搜索中得出的Adaboost集成,最终模型非常简单,因为它们由单个特征分类器输出的加权和组成。此外,在虚拟筛选实验后,Adaboost分数可以用作排序标准,以优先排序用于合成和生物学评估的化学物质。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号