首页> 美国卫生研究院文献>other >Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
【2h】

Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes

机译:使用具有二进制和生存结果的注册数据优化刑事累犯模型的预测性能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible black-box model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process.
机译:在累犯预测上下文中,对于应采用哪种建模策略以获得最佳预测模型尚无共识。在以前的论文中,一系列统计和机器学习技术均以累犯数据为基准,并具有二进制结果。然而,两种重要的树木集成方法,即梯度增强法和随机森林法,并未得到广泛评估。在本文中,我们将进一步探讨这些技术在二元结果犯罪预测背景下的建模潜力。此外,我们探索了经典的统计和机器学习方法对事件时间数据的预测潜力。在二元结局案例和删失结局案例中,一系列的手动指定统计和(半)自动机器学习模型都适用于荷兰累犯数据。为了增强结果的通用性,将相同的模型应用于两个美国历史数据集,即北卡罗来纳州监狱数据。对于所有数据集,在二进制情况下的(半)自动建模似乎没有对手动指定的传统统计模型进行任何改进。但是,有证据显示生存数据中梯度增强的性能略有改善。来自两个来源的对流数据的结果表明,应该尝试统计和机器学习来获得最佳模型。即使灵活的黑匣子模型不能改善手动指定模型的预测,也可以用作测试是否缺少重要的交互作用或是否存在模型的其他错误规定,从而可以在建模过程中提供更大的安全性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号