A nonparametric multiple imputation approach for missing categorical data

Muhan Zhou; Yulei He; Mandi Yu; Chiu-Hsieh Hsu

首页> 外文期刊>BMC Medical Research Methodology >A nonparametric multiple imputation approach for missing categorical data

【24h】

A nonparametric multiple imputation approach for missing categorical data

机译：缺失分类数据的非参数多重插补方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.

机译：背景在公共卫生数据中，具有两个以上类别的不完整类别变量很常见。但是，大多数现有的缺失数据方法都不使用来自无响应（缺失）概率的信息。方法我们提出了一种最近邻多重插补方法，以插补随机分类结果中的缺失项，并估计每个类别的比例。通过测量每个缺失值与其他非缺失值之间的距离来形成估算的供体集。距离函数是根据预测分数计算的，该预测分数来自两个工作模型：一个拟合多项式逻辑回归，以预测缺失的分类结果（结果模型），另一个拟合逻辑回归，以预测失踪概率（缺失模型）。在生成预测分数时，使用加权方案来容纳来自两个工作模型的贡献。通过随机选择距离最小的非缺失值之一来估算缺失值。我们进行了仿真，以评估所提出方法的性能，并将其与几种替代方法进行比较。还提供了一个实际数据应用程序。结果仿真研究表明，在工作模型的某些错误指定下，当缺失概率不是极高时，该方法效果很好。但是，同样基于两个工作模型的校准估计器在某些观察的缺失概率极高时可能会非常不稳定。在这种情况下，建议的方法会产生更稳定和更好的估计。另外，需要选择适当的权重来平衡来自两个工作模型的贡献并为所提出的方法获得最佳结果。结论我们得出的结论是，所提出的多重插补方法是处理缺失的分类结果数据的合理方法，该分类结果数据具有两个以上级别来评估结果的分布。根据工作模型的选择，我们建议使用多项式逻辑回归来预测缺失的结果，并采用二进制逻辑回归来预测缺失的可能性。

著录项

来源
《BMC Medical Research Methodology》 |2017年第1期|共页
作者
Muhan Zhou; Yulei He; Mandi Yu; Chiu-Hsieh Hsu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. A NONPARAMETRIC MULTIPLE IMPUTATION APPROACH FOR DATA WITH MISSING COVARIATE VALUES WITH APPLICATION TO COLORECTAL ADENOMA DATA [J] . Chiu-Hsieh Hsu, Qi Long, Yisheng Li, Journal of biopharmaceutical statistics . 2014,第3期

机译：缺失协变量值的数据的非参数多重插补方法及其在大肠腺瘤数据中的应用
2. Latent class based multiple imputation approach for missing categorical data [J] . Gebregziabher M., DeSantis S.M. Journal of Statistical Planning and Inference . 2010,第11期

机译：基于潜在类的多重归类方法用于缺失类别数据
3. Multiple time period imputation technique for multiple missing traffic variables: nonparametric regression approach [J] . Hyunho Chang, Dongjoo Park, Younginn Lee, Canadian Journal of Civil Engineering . 2012,第4期

机译：针对多个缺失交通变量的多时间段插补技术：非参数回归方法
4. A Novel Nonparametric Multiple Imputation Algorithm for Estimating Missing Data [C] . Iffat A. Gheyas, Leslie S. Smith World Congress on Engineering . 2009

机译：一种估计缺失数据的新型非参数多重估算算法
5. The impact of missing data treatments in a multiple regression analysis: A Monte Carlo comparison of deterministic imputation, stochastic imputation, multiple imputation, and the deletion procedures [D] . Newsome, Dwight Howard. 1996

机译：多元回归分析中缺失数据处理的影响：确定性归因，随机归因，多重归因和删除程序的蒙特卡洛比较
6. A nonparametric multiple imputation approach for missing categorical data [O] . Muhan Zhou, Yulei He, Mandi Yu, 2017

机译：缺失分类数据的非参数多重插补方法
7. A nonparametric multiple imputation approach for missing categorical data [O] . Muhan Zhou, Yulei He, Mandi Yu, 2017

机译：缺失分类数据的非参数多重插补方法

A nonparametric multiple imputation approach for missing categorical data

摘要

著录项

相似文献

相关主题

期刊订阅