首页> 美国卫生研究院文献>Frontiers in Chemistry >Prediction Is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models Based on Imbalanced Chemical Data Sets
【2h】

Prediction Is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models Based on Imbalanced Chemical Data Sets

机译:预测是一种平衡行为:基于不平衡化学数据集的采样方法对平衡预测模型的敏感性和特异性的重要性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Increase in the number of new chemicals synthesized in past decades has resulted in constant growth in the development and application of computational models for prediction of activity as well as safety profiles of the chemicals. Most of the time, such computational models and its application must deal with imbalanced chemical data. It is indeed a challenge to construct a classifier using imbalanced data set. In this study, we analyzed and validated the importance of different sampling methods over non-sampling method, to achieve a well-balanced sensitivity and specificity of a machine learning model trained on imbalanced chemical data. Additionally, this study has achieved an accuracy of 93.00%, an AUC of 0.94, F1 measure of 0.90, sensitivity of 96.00% and specificity of 91.00% using SMOTE sampling and Random Forest classifier for the prediction of Drug Induced Liver Injury (DILI). Our results suggest that, irrespective of data set used, sampling methods can have major influence on reducing the gap between sensitivity and specificity of a model. This study demonstrates the efficacy of different sampling methods for class imbalanced problem using binary chemical data sets.
机译:在过去的几十年中,合成的新化学品数量的增加导致用于预测活性和化学品安全性的计算模型的开发和应用不断增长。大多数时候,这种计算模型及其应用必须处理不平衡的化学数据。使用不平衡数据集构造分类器确实是一个挑战。在这项研究中,我们分析并验证了不同采样方法相对于非采样方法的重要性,以实现在化学数据不平衡下训练的机器学习模型的均衡的敏感性和特异性。此外,这项研究使用SMOTE采样和随机森林分类器预测药物诱发的肝损伤(DILI)的准确度为93.00%,AUC为0.94,F1值为0.90,敏感性为96.00%,特异性为91.00%。我们的结果表明,无论使用哪种数据集,采样方法都可以对减小模型的敏感性和特异性之间的差距产生重大影响。这项研究证明了使用二进制化学数据集对类不平衡问题采用不同抽样方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号