首页> 外文期刊>Computational statistics >Predicting missing values: a comparative study on non-parametric approaches for imputation
【24h】

Predicting missing values: a comparative study on non-parametric approaches for imputation

机译:预测缺失值:对归责的非参数方法的比较研究

获取原文
获取原文并翻译 | 示例
           

摘要

Missing data is an expected issue when large amounts of data is collected, and several imputation techniques have been proposed to tackle this problem. Beneath classical approaches such as MICE, the application of Machine Learning techniques is tempting. Here, the recently proposed missForest imputation method has shown high imputation accuracy under the Missing (Completely) at Random scheme with various missing rates. In its core, it is based on a random forest for classification and regression, respectively. In this paper we study whether this approach can even be enhanced by other methods such as the stochastic gradient tree boosting method, the C5.0 algorithm, BART or modified random forest procedures. In particular, other resampling strategies within the random forest protocol are suggested. In an extensive simulation study, we analyze their performances for continuous, categorical as well as mixed-type data. An empirical analysis focusing on credit information and Facebook data complements our investigations.
机译:缺少数据是收集大量数据时的预期问题,并且已经提出了几种归咎地来解决这个问题。在古典的方法之下,如小鼠,机器学习技术的应用很诱人。在这里,最近提出的错士估算方法在随机方案中显示了具有各种缺失速率的随机方案的高估计精度。在其核心中,它分别基于随机森林分别进行分类和回归。在本文中,我们研究了这种方法是否可以通过其他方法来增强,例如随机梯度树升压方法,C5.0算法,BART或修改随机林手术等其他方法。特别是,建议随机森林协议内的其他重采样策略。在广泛的仿真研究中,我们分析了连续,分类以及混合型数据的性能。专注于信用信息和Facebook数据的实证分析补充了我们的调查。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号