首页> 美国卫生研究院文献>SAGE Choice >Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models
【2h】

Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models

机译:每个变量的事件(EPV)和用于估计逻辑回归模型的样本外有效性的不同策略的相对性能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We conducted an extensive set of empirical analyses to examine the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation, and optimism correction using bootstrap methods. Using a single dataset of patients hospitalized with heart failure, we compared the estimates of discriminatory performance from these methods to those for a very large independent validation sample arising from the same population. As anticipated, the apparent performance was optimistically biased, with the degree of optimism diminishing as the number of events per variable increased. Differences between the bootstrap-corrected approach and the use of an independent validation sample were minimal once the number of events per variable was at least 20. Split-sample assessment resulted in too pessimistic and highly uncertain estimates of model performance. Apparent performance estimates had lower mean squared error compared to split-sample estimates, but the lowest mean squared error was obtained by bootstrap-corrected optimism estimates. For bias, variance, and mean squared error of the performance estimates, the penalty incurred by using split-sample validation was equivalent to reducing the sample size by a proportion equivalent to the proportion of the sample that was withheld for model validation. In conclusion, split-sample validation is inefficient and apparent performance is too optimistic for internal validation of regression-based prediction models. Modern validation methods, such as bootstrap-based optimism correction, are preferable. While these findings may be unsurprising to many statisticians, the results of the current study reinforce what should be considered good statistical practice in the development and validation of clinical prediction models.
机译:我们进行了广泛的实证分析,以检验每种变量的事件数(EPV)对评估Logistic回归模型的预测准确性的三种不同方法的相对性能的影响:分析样本中的表观性能,样本验证和使用引导程序的乐观校正。使用单一的心力衰竭住院患者数据集,我们将这些方法的歧视性表现估计值与来自相同人群的非常大的独立验证样本的估计值进行了比较。如预期的那样,表观表现受到乐观偏见,随着每个变量的事件数量增加,乐观程度逐渐降低。一旦每个变量的事件数至少为20,引导程序校正的方法与使用独立的验证样本之间的差异就很小。分割样本评估导致对模型性能的估计过于悲观和高度不确定。与分割样本估计相比,表观性能估计具有较低的均方误差,但最低均方误差是通过自举校正后的乐观估计得出的。对于性能估计的偏差,方差和均方误差,使用分割样本验证所造成的损失等于将样本大小减少的比例与为模型验证所保留的样本比例相等。总之,对于基于回归的预测模型的内部验证,分割样本验证效率低下,并且明显的性能过于乐观。最好采用现代验证方法,例如基于引导程序的乐观校正。尽管这些发现对于许多统计学家来说可能并不令人惊讶,但当前的研究结果加强了在临床预测模型的开发和验证中应视为良好的统计实践。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号