首页> 外文期刊>Statistical Methods and Applications >Variable selection techniques after multiple imputation in high-dimensional data
【24h】

Variable selection techniques after multiple imputation in high-dimensional data

机译:高维数据中多个归纳后的可变选择技术

获取原文
获取原文并翻译 | 示例
       

摘要

High-dimensional data arise from diverse fields of scientific research. Missing values are often encountered in such data. Variable selection plays a key role in high-dimensional data analysis. Like many other statistical techniques, variable selection requires complete cases without any missing values. A variety of variable selection techniques for complete data is available, but similar techniques for the data with missing values are deficient in the literature. Multiple imputation is a popular approach to handle missing values and to get completed data. If a particular variable selection technique is applied independently on each of the multiply imputed datasets, a different model for each dataset may be the result. It is still unclear in the literature how to implement variable selection techniques on multiply imputed data. In this paper, we propose to use the magnitude of the parameter estimates of each candidate predictor across all the imputed datasets for its selection. A constraint is imposed on the sum of absolute values of these estimates to select or remove the predictor from the model. The proposed method for identifying the informative predictors is compared with other approaches in an extensive simulation study. The performance is compared on the basis of the hit rates (proportion of correctly identified informative predictors) and the false alarm rates (proportion of non-informative predictors dubbed as informative) for different numbers of imputed datasets. The proposed technique is simple and easy to implement, and performs equally well in the high-dimensional case as in the low-dimensional settings. The proposed technique is observed to be a good competitor to the existing approaches in different simulation settings. The performance of different variable selection techniques is also examined for a real dataset with missing values.
机译:从各种科学研究领域出现高维数据。这些数据通常遇到缺失值。变量选择在高维数据分析中扮演关键作用。与许多其他统计技术一样,变量选择需要完整的情况而没有任何缺失值。可用的各种可变选择技术可用,但具有缺失值的数据的类似技术在文献中缺乏。多个估算是一种处理缺失值并获得完成数据的流行方法。如果在每个乘法数据集上独立地应用特定的变量选择技术,则每个数据集的不同模型可能是结果。在文献中尚不清楚如何实现乘法算法的可变选择技术。在本文中,我们建议在所有避税数据集中使用每个候选预测器的参数估计的大小。对这些估计的绝对值之和施加约束,以从模型中选择或删除预测器。将识别信息预测器的识别方法与广泛的模拟研究中的其他方法进行比较。对于不同数量的避障数据集,在命中率(正确识别的信息预测器的比例)和错误的报警速率(称为信息的非信息预测器的比例)的比较。所提出的技术简单且易于实现,并且在高维设置中的高维情况下同样良好地执行。观察到所提出的技术是不同仿真设置中现有方法的良好竞争对手。还检查不同变量选择技术的性能,用于具有缺失值的实时数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号