...
首页> 外文期刊>Journal of chemical information and modeling >Two new parameters based on distances in a receiver operating characteristic chart for the selection of classification models
【24h】

Two new parameters based on distances in a receiver operating characteristic chart for the selection of classification models

机译:基于接收器工作特性图中距离的两个新参数,用于选择分类模型

获取原文
获取原文并翻译 | 示例
           

摘要

There are several indices that provide an indication of different types on the performance of QSAR classification models, being the area under a Receiver Operating Characteristic (ROC) curve still the most powerful test to overall assess such performance. All ROC related parameters can be calculated for both the training and test sets, but, nevertheless, neither of them constitutes an absolute indicator of the classification performance by themselves. Moreover, one of the biggest drawbacks is the computing time needed to obtain the area under the ROC curve, which naturally slows down any calculation algorithm. The present study proposes two new parameters based on distances in a ROC curve for the selection of classification models with an appropriate balance in both training and test sets, namely the following: the ROC graph Euclidean distance (ROCED) and the ROC graph Euclidean distance corrected with Fitness Function (FIT) (ROCFIT). The behavior of these indices was observed through the study on the mutagenicity for four genotoxicity end points of a number of nonaromatic halogenated derivatives. It was found that the ROCED parameter gets a better balance between sensitivity and specificity for both the training and prediction sets than other indices such as the Matthews correlation coefficient, the Wilk's lambda, or parameters like the area under the ROC curve. However, when the ROCED parameter was used, the follow-on linear discriminant models showed the lower statistical significance. But the other parameter, ROCFIT, maintains the ROCED capabilities while improving the significance of the models due to the inclusion of FIT.
机译:有几种指标可以提供有关QSAR分类模型性能的不同类型的指示,因为接收器工作特征(ROC)曲线下的面积仍然是整体评估此类性能的最有效测试。可以为训练和测试集计算所有与ROC相关的参数,但是,它们本身都不构成分类性能的绝对指标。此外,最大的缺点之一是获得ROC曲线下面积所需的计算时间,这自然会减慢任何计算算法的速度。本研究基于ROC曲线中的距离提出了两个新参数,用于选择训练和测试集中具有适当平衡的分类模型,即:ROC图欧氏距离(ROCED)和ROC图欧氏距离校正具有健身功能(FIT)(ROCFIT)。通过对多种非芳族卤代衍生物的四个遗传毒性终点的致突变性研究,观察了这些指标的行为。结果发现,对于训练集和预测集,ROCED参数在敏感性和特异性之间取得了比其他指标(例如马修斯相关系数,Wilkλ或ROC曲线下面积等参数)更好的平衡。但是,当使用ROCED参数时,后续的线性判别模型显示出较低的统计显着性。但是另一个参数ROCFIT保留了ROCED功能,同时由于包含FIT而提高了模型的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号