首页> 外文学位 >Predictive accuracy measures for binary outcomes: Impact of incidence rate and optimization techniques.
【24h】

Predictive accuracy measures for binary outcomes: Impact of incidence rate and optimization techniques.

机译:二元结果的预测准确性度量:发生率和优化技术的影响。

获取原文
获取原文并翻译 | 示例

摘要

Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, Log-Loss, R2 and Fbeta. To examine the underlying relationships among all of the measures, real data and simulation studies are used.;The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression.;While most of the measures are easily optimized using the Newton-Raphson algorithm, the maximization of the area under the ROC curve requires optimization of a non-linear, non-differentiable function. Usage of the Nelder-Mead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model.
机译:可以使用多种措施来评估预测二进制结果的模型的性能。尽管有些度量旨在描述模型的整体拟合度,但其他度量更准确地描述了模型区分两种结果的能力。如果模型适合但不能很好地区分,那说明了什么?给定两个模型,如果一个模型能够很好地识别但拟合效果较差,而另一种模型很好地能够拟合却判别效果较差,那么我们应该选择两个模型中的哪一个?我们的研究感兴趣的度量包括ROC曲线下的面积,Brier得分,分辨斜率,对数损失,R2和Fbeta。为了检查所有措施之间的潜在关系,使用了真实数据和模拟研究。真实数据来自多个心血管研究,并且模拟研究是在一般条件下进行的,并且发生率在2%至50%的范围内。这些分析的结果提供了对措施之间关系的洞察力,并在措施可能得出不同结论时引起人们对方案的关注。发生率对关系的影响为探索Logistic回归的替代最大化例程提供了基础。;虽然大多数措施可以使用Newton-Raphson算法轻松优化,但ROC曲线下面积的最大化需要优化-线性,不可微的函数。 Nelder-Mead单纯形算法的使用以及与经济学研究的紧密联系产生了独特的参数估计和一般渐近条件。使用实际数据和模拟数据进行比较,以最优化ROC曲线下的面积进行逻辑回归,进一步揭示了发生率对关系的影响,ROC曲线下可实现面积的显着增加以及在模型中包含变量的结论差异。

著录项

  • 作者

    Scolnik, Ryan.;

  • 作者单位

    The Florida State University.;

  • 授予单位 The Florida State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 107 p.
  • 总页数 107
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号