...
首页> 外文期刊>SAR and QSAR in Environmental Research >A novel approach to generate robust classification models to predict developmental toxicity from imbalanced datasets
【24h】

A novel approach to generate robust classification models to predict developmental toxicity from imbalanced datasets

机译:生成鲁棒分类模型以从不平衡数据集中预测发育毒性的新方法

获取原文
获取原文并翻译 | 示例

摘要

Computational models to predict the developmental toxicity of compounds are built on imbalanced datasets wherein the toxicants outnumber the non-toxicants. Consequently, the results are biased towards the majority class (toxicants). To overcome this problem and to obtain sensitive but also accurate classifiers, we followed an integrated approach wherein (i) Synthetic Minority Over Sampling (SMOTE) is used for re-sampling, (ii) genetic algorithm (GA) is used for variable selection and (iii) support vector machines (SVM) is used for model development. The best model, M3, has (i) sensitivity (SE) = 85.54% and specificity (SP) = 85.62% in leave-one-out validation, (ii) classification accuracy of the training set = 99.67%, (iii) classification accuracy of the test set = 92.59%; and (iv) sensitivity = 92.68, specificity = 92.31 on the test set. Consensus prediction based on models M3-M5 improved these percentages by 5% over M3. From the analysis of results we infer that data imbalance in toxicity studies can be effectively addressed by the application of re-sampling techniques
机译:预测化合物发育毒性的计算模型建立在不平衡的数据集上,其中毒物的数量超过了无毒物的数量。因此,结果偏向多数类别(有毒物质)。为克服此问题并获得敏感但准确的分类器,我们采用了一种集成方法,其中(i)使用综合少数采样(SMOTE)进行重新采样,(ii)使用遗传算法(GA)进行变量选择, (iii)支持向量机(SVM)用于模型开发。最好的模型M3具有(i)留一法验证的灵敏度(SE)= 85.54%和特异性(SP)= 85.62%,(ii)训练集的分类精度= 99.67%,(iii)分类测试仪的准确度= 92.59%; (iv)在测试装置上的灵敏度= 92.68,特异性= 92.31。基于模型M3-M5的共识预测将这些百分比比M3提高了5%。通过对结果的分析,我们可以推断出毒性研究中的数据不平衡可以通过应用重采样技术得到有效解决。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号