首页> 外文期刊>BMC Medical Research Methodology >A nonparametric multiple imputation approach for missing categorical data
【24h】

A nonparametric multiple imputation approach for missing categorical data

机译:缺失分类数据的非参数多重插补方法

获取原文
           

摘要

Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.
机译:背景在公共卫生数据中,具有两个以上类别的不完整类别变量很常见。但是,大多数现有的缺失数据方法都不使用来自无响应(缺失)概率的信息。方法我们提出了一种最近邻多重插补方法,以插补随机分类结果中的缺失项,并估计每个类别的比例。通过测量每个缺失值与其他非缺失值之间的距离来形成估算的供体集。距离函数是根据预测分数计算的,该预测分数来自两个工作模型:一个拟合多项式逻辑回归,以预测缺失的分类结果(结果模型),另一个拟合逻辑回归,以预测失踪概率(缺失模型)。在生成预测分数时,使用加权方案来容纳来自两个工作模型的贡献。通过随机选择距离最小的非缺失值之一来估算缺失值。我们进行了仿真,以评估所提出方法的性能,并将其与几种替代方法进行比较。还提供了一个实际数据应用程序。结果仿真研究表明,在工作模型的某些错误指定下,当缺失概率不是极高时,该方法效果很好。但是,同样基于两个工作模型的校准估计器在某些观察的缺失概率极高时可能会非常不稳定。在这种情况下,建议的方法会产生更稳定和更好的估计。另外,需要选择适当的权重来平衡来自两个工作模型的贡献并为所提出的方法获得最佳结果。结论我们得出的结论是,所提出的多重插补方法是处理缺失的分类结果数据的合理方法,该分类结果数据具有两个以上级别来评估结果的分布。根据工作模型的选择,我们建议使用多项式逻辑回归来预测缺失的结果,并采用二进制逻辑回归来预测缺失的可能性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号