The influence of negative training set size on machine learning-based virtual screening

Rafa? Kurczab; Sabina Smusz; Andrzej J Bojarski

首页> 外文期刊>Journal of Cheminformatics >The influence of negative training set size on machine learning-based virtual screening

【24h】

The influence of negative training set size on machine learning-based virtual screening

机译：负面训练集大小对基于机器学习的虚拟筛选的影响

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Na?ve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Na?ve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.

机译：背景技术本文全面分析了负面训练示例的数量对机器学习方法性能的影响。结果对从ZINC数据库中随机选择的，包含固定数量的正例和不同数量的负例的集合，研究了机器学习方法应用这一相当被忽略的方面的影响。发现正训练实例与负训练实例之比的增加极大地影响了模拟虚拟筛选实验中大多数研究的ML方法评估参数。在大多数情况下，观察到精度和MCC的显着提高，同时命中回忆的降低。对这些变化的动态分析使我们推荐了训练数据的最佳组合。这项研究是针对几种蛋白质目标，5种机器学习算法（SMO，朴素贝叶斯，Ibk，J48和随机森林）和2种类型的分子指纹（MACCS和CDK FP）进行的。 CDK FP与SMO或随机森林算法的组合提供了最有效的分类。朴素贝叶斯模型似乎对训练集中否定实例数量的变化几乎不敏感。结论总之，在机器学习实验的准备过程中应考虑正训练实例与负训练实例的比率，因为它可能会显着影响特定分类器的性能。此外，在基于机器学习的虚拟筛选中，可以将消极训练集大小的优化用作一种类似提升的方法。

著录项

来源
《Journal of Cheminformatics》 |2014年第s1期|共页
作者
Rafa? Kurczab; Sabina Smusz; Andrzej J Bojarski;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类化学;
关键词

相似文献

外文文献
中文文献
专利

1. Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening [J] . Heikamp K., Bajorath J. Journal of chemical information and modeling . 2013,第7期

机译：在基于支持向量机的虚拟筛选中比较确认的无活性和随机选择的化合物作为阴性训练实例
2. Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries [J] . Kumar Pankaj, Ma XH, Liu XH, Journal of Computer-Aided Molecular Design . 2011,第5期

机译：训练数据大小和噪声水平对支持向量机从大型化合物库中虚拟筛选遗传毒性化合物的影响
3. Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries [J] . Pankaj Kumar, Xiaohua Ma, Xianghui Liu, Journal of Computer-Aided Molecular Design . 2011,第5期

机译：训练数据大小和噪声水平对支持向量机从大型化合物库中虚拟筛选遗传毒性化合物的影响
4. Machine Learning In Virtualization: Estimate A Virtual Machine's Working Set Size [C] . Anna Melekhova IEEE International Conference on Cloud Computing . 2013

机译：虚拟化机器学习：估计虚拟机的工作集大小
5. From Virtual High-throughput Screening and Machine Learning to the Discovery and Rational Design of Polymers for Optical Applications [D] . Afzal, Mohammad Atif Faiz. 2018

机译：从虚拟高通量筛选和机器学习到光学应用聚合物的发现和合理设计
6. The influence of negative training set size on machine learning-based virtual screening [O] . Rafał Kurczab, Sabina Smusz, Andrzej J Bojarski 2014

机译：负面训练集大小对基于机器学习的虚拟筛选的影响
7. The influence of negative training set size on machine learning-based virtual screening [O] . Rafał Kurczab, Sabina Smusz, Andrzej J Bojarski 2014

机译：负面训练集大小对基于机器学习的虚拟筛选的影响

The influence of negative training set size on machine learning-based virtual screening

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅