Variable selection techniques after multiple imputation in high-dimensional data

Faisal Maqbool Zahid; Shahla Faisal; Christian Heumann

首页> 外文期刊>Statistical Methods and Applications >Variable selection techniques after multiple imputation in high-dimensional data

【24h】

Variable selection techniques after multiple imputation in high-dimensional data

机译：高维数据中多个归纳后的可变选择技术

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

High-dimensional data arise from diverse fields of scientific research. Missing values are often encountered in such data. Variable selection plays a key role in high-dimensional data analysis. Like many other statistical techniques, variable selection requires complete cases without any missing values. A variety of variable selection techniques for complete data is available, but similar techniques for the data with missing values are deficient in the literature. Multiple imputation is a popular approach to handle missing values and to get completed data. If a particular variable selection technique is applied independently on each of the multiply imputed datasets, a different model for each dataset may be the result. It is still unclear in the literature how to implement variable selection techniques on multiply imputed data. In this paper, we propose to use the magnitude of the parameter estimates of each candidate predictor across all the imputed datasets for its selection. A constraint is imposed on the sum of absolute values of these estimates to select or remove the predictor from the model. The proposed method for identifying the informative predictors is compared with other approaches in an extensive simulation study. The performance is compared on the basis of the hit rates (proportion of correctly identified informative predictors) and the false alarm rates (proportion of non-informative predictors dubbed as informative) for different numbers of imputed datasets. The proposed technique is simple and easy to implement, and performs equally well in the high-dimensional case as in the low-dimensional settings. The proposed technique is observed to be a good competitor to the existing approaches in different simulation settings. The performance of different variable selection techniques is also examined for a real dataset with missing values.

机译：从各种科学研究领域出现高维数据。这些数据通常遇到缺失值。变量选择在高维数据分析中扮演关键作用。与许多其他统计技术一样，变量选择需要完整的情况而没有任何缺失值。可用的各种可变选择技术可用，但具有缺失值的数据的类似技术在文献中缺乏。多个估算是一种处理缺失值并获得完成数据的流行方法。如果在每个乘法数据集上独立地应用特定的变量选择技术，则每个数据集的不同模型可能是结果。在文献中尚不清楚如何实现乘法算法的可变选择技术。在本文中，我们建议在所有避税数据集中使用每个候选预测器的参数估计的大小。对这些估计的绝对值之和施加约束，以从模型中选择或删除预测器。将识别信息预测器的识别方法与广泛的模拟研究中的其他方法进行比较。对于不同数量的避障数据集，在命中率（正确识别的信息预测器的比例）和错误的报警速率（称为信息的非信息预测器的比例）的比较。所提出的技术简单且易于实现，并且在高维设置中的高维情况下同样良好地执行。观察到所提出的技术是不同仿真设置中现有方法的良好竞争对手。还检查不同变量选择技术的性能，用于具有缺失值的实时数据集。

著录项

来源
《Statistical Methods and Applications》 |2020年第3期|553-580|共28页
作者
Faisal Maqbool Zahid; Shahla Faisal; Christian Heumann;
展开▼
作者单位

Department of Statistics Government College University Faisalabad Faisalabad Pakistan;

Department of Statistics Government College University Faisalabad Faisalabad Pakistan;

Department of Statistics Ludwig-Maximilians-University Munich Munich Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
High-dimensional data; Multiple imputation; LASSO; Rubin's rules; Variable selection;

机译：高维数据;多重估算;套索;鲁宾的规则;变量选择;
入库时间 2022-08-18 21:23:19

相似文献

外文文献
中文文献
专利

1. A combination of variable selection and data mining techniques for high-dimensional statistical modelling [J] . Christos Koukouvinos, Kalliopi Mylona, Christina Parpoula International journal of information and decision sciences . 2013,第2期

机译：结合变量选择和数据挖掘技术进行高维统计建模
2. Integrative analysis and variable selection with multiple high-dimensional data sets [J] . Ma S., Huang J., Song X. Biostatistics . 2011,第4期

机译：具有多个高维数据集的集成分析和变量选择
3. Integrative analysis and variable selection with multiple high-dimensional data sets [J] . Shuangge Ma Biostatistics . 2011,第4期

机译：具有多个高维数据集的集成分析和变量选择
4. Genetic Programming for Imputation Predictor Selection and Ranking in Symbolic Regression with High-Dimensional Incomplete Data [C] . Baligh Al-Helali, Qi Chen, Bing Xue, Australasian joint conference on artificial intelligence . 2019

机译：具有高维不完整数据的符号回归中归因预测变量选择和排序的遗传程序设计
5. Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model. [D] . Lu, Xiang. 2016

机译：使用纵向因素分析模型通过多重插补处理具有混合数据类型的不完整的高维多元纵向数据。
6. Integrative analysis and variable selection with multiple high-dimensional data sets [O] . Shuangge Ma, Jian Huang, Xiao Song -1

机译：具有多个高维数据集的集成分析和变量选择
7. Integrative analysis and variable selection with multiple high-dimensional data sets [O] . S. Ma, J. Huang, X. Song 2011

机译：具有多个高维数据集的集成分析和变量选择

Variable selection techniques after multiple imputation in high-dimensional data

摘要

著录项

相似文献

相关主题

期刊订阅