首页> 美国卫生研究院文献>ACS AuthorChoice >Influence of Varying Training Set Composition andSize on Support Vector Machine-Based Prediction of Active Compounds
【2h】

Influence of Varying Training Set Composition andSize on Support Vector Machine-Based Prediction of Active Compounds

机译:训练集组成和变化的影响基于支持向量机的活性化合物预测大小

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Support vector machine (SVM) modeling is one of the most popular machine learning approaches in chemoinformatics and drug design. The influence of training set composition and size on predictions currently is an underinvestigated issue in SVM modeling. In this study, we have derived SVM classification and ranking models for a variety of compound activity classes under systematic variation of the number of positive and negative training examples. With increasing numbers of negative training compounds, SVM classification calculations became increasingly accurate and stable. However, this was only the case if a required threshold of positive training examples was also reached. In addition, consideration of class weights and optimization of cost factors substantially aided in balancing the calculations for increasing numbers of negative training examples. Taken together, the results of our analysis have practical implications for SVM learning and the prediction of active compounds. For all compound classes under study, top recall performance and independence of compound recall of training set composition wasachieved when 250–500 active and 500–1000 randomly selectedinactive training instances were used. However, as long as ∼50known active compounds were available for training, increasing numbers of 500–1000randomly selected negative training examples significantly improvedmodel performance and gave very similar results for different trainingsets.
机译:支持向量机(SVM)建模是化学信息学和药物设计中最受欢迎的机器学习方法之一。训练集的组成和大小对预测的影响目前在SVM建模中尚未得到充分研究。在这项研究中,我们在正负训练示例数量的系统变化下,推导了各种复合活动类别的SVM分类和排名模型。随着否定训练化合物数量的增加,SVM分类计算变得越来越准确和稳定。但是,只有在达到正面训练示例的要求阈值的情况下,才是这种情况。另外,考虑班级权重和成本因素的优化在很大程度上有助于平衡计算,以增加负面训练样本的数量。两者合计,我们的分析结果对SVM学习和活性化合物的预测具有实际意义。对于所有正在研究的复合课程,最佳回忆表现和训练集组成的复合回忆的独立性为250-500个活动和500-1000个随机选择时达到使用了非活动训练实例。但是,只要〜50已知的活性化合物可用于培训,数量增加了500–1000随机选择的负面训练实例有明显改善模型的性能,并针对不同的训练给出非常相似的结果套。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号