首页> 外文期刊>Model assisted statistics and applications >Variance estimation by multivariate imputation methods in complex survey designs
【24h】

Variance estimation by multivariate imputation methods in complex survey designs

机译:复杂调查设计中多元插补方法的方差估计

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we consider variance estimation of the sample mean when the missing data have been imputed with multivariate imputation methods. Modern multivariate imputation methods to missing data are complicated and computationally expensive. These multivariate imputation methods do not require the normality assumption to impute the missing values. Under this assumption free condition, we compare the performance of variance estimation of six modern multivariate imputation methods including copula imputation, random forest imputation, principal component analysis imputation, and k-nearest neighbors imputation methods in complex sampling designs such as stratified sampling, cluster sampling and resampling approach to variance estimation by jackknife and bootstrap methods in stratified sampling. We conducted simulation studies using National Health and Nutrition Survey data considering 5% and 15% missing completely at random (MCAR) rates. Based on our 500 times resampling simulation study of the mean squares errors of the sample mean in complex survey designs, the percent relative efficiency (RE(%)) of the random forest (RF) imputation method appears to outperform other imputation methods overall when the data has high skewness at the 5% missing rate and when the data has high excessive kurtosis at the 15% missing rate whereas the principal component analysis (PCA) imputation method appears to outperform other imputation methods when the data has high skewness at the 5% and 15% missing rates. Especially, the RE(%) of the multivariate imputation methods appears to be efficient in the cluster sampling design when the data has high skewness or excessive kurtosis at the 15% missing rate.
机译:在本文中,当使用多元插补方法插补缺失数据时,我们考虑样本均值的方差估计。用于丢失数据的现代多元插补方法复杂且计算量大。这些多元插补方法不需要正态性假设即可插补缺失值。在无假设的条件下,我们比较了分层抽样,聚类抽样等复杂抽样设计中包括算子插补,随机森林插补,主成分分析插补和k近邻插补方法在内的六种现代多元插补方法的方差估计的性能。分层抽样中通过折刀和自举法进行方差估计的重采样方法。我们使用“国民健康与营养调查”数据进行了模拟研究,考虑了5%和15%的随机(MCAR)率完全缺失。根据我们对复杂调查设计中样本均值均方误差的500次重采样模拟研究,当总体森林抽样(RF)估算方法的相对效率百分比(RE(%))总体上优于其他估算方法时,数据在5%丢失率时具有较高的偏度,而在数据在15%丢失率时具有较高的峰度时,而当数据在5%时具有较高的偏度时,主成分分析(PCA)插补方法似乎优于其他插补方法和15%的遗失率。尤其是,当数据具有高偏度或峰度过高(丢失率为15%)时,多元插补方法的RE(%)在聚类抽样设计中似乎很有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号