...
首页> 外文期刊>Journal of chemical information and modeling >Comparison of random forest and Pipeline Pilot Na?ve Bayes in prospective QSAR predictions
【24h】

Comparison of random forest and Pipeline Pilot Na?ve Bayes in prospective QSAR predictions

机译:未来QSAR预测中随机森林和管道先导贝叶斯的比较

获取原文
获取原文并翻译 | 示例
           

摘要

Random forest is currently considered one of the best QSAR methods available in terms of accuracy of prediction. However, it is computationally intensive. Na?ve Bayes is a simple, robust classification method. The Laplacian-modified Na?ve Bayes implementation is the preferred QSAR method in the widely used commercial chemoinformatics platform Pipeline Pilot. We made a comparison of the ability of Pipeline Pilot Na?ve Bayes (PLPNB) and random forest to make accurate predictions on 18 large, diverse in-house QSAR data sets. These include on-target and ADME-related activities. These data sets were set up as classification problems with either binary or multicategory activities. We used a time-split method of dividing training and test sets, as we feel this is a realistic way of simulating prospective prediction. PLPNB is computationally efficient. However, random forest predictions are at least as good and in many cases significantly better than those of PLPNB on our data sets. PLPNB performs better with ECFP4 and ECFP6 descriptors, which are native to Pipeline Pilot, and more poorly with other descriptors we tried.
机译:就预测的准确性而言,随机森林目前被认为是最好的QSAR方法之一。但是,这是计算密集型的。朴素贝叶斯是一种简单,强大的分类方法。在广泛使用的商业化学信息学平台Pipeline Pilot中,拉普拉斯修改的Nave Bayes实现是首选的QSAR方法。我们比较了管道先导贝叶斯(PLPNB)和随机森林对18个大型,多样的内部QSAR数据集进行准确预测的能力。这些活动包括针对性活动和与ADME相关的活动。这些数据集被设置为具有二进制或多类别活动的分类问题。我们使用了一种将训练和测试集分开的时间分割方法,因为我们认为这是模拟前瞻性预测的现实方法。 PLPNB计算效率高。但是,在我们的数据集上,随机森林预测至少与PLPNB的预测一样好,并且在许多情况下要好得多。 PLPNB在ECFP4和ECFP6描述符(管道试验的本机)中表现更好,而在我们尝试过的其他描述符中表现较差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号