首页> 外文OA文献 >Multiple imputation for missing data and statisticaldisclosure control for mixed-mode data using asequence of generalised linear models

【2h】

Multiple imputation for missing data and statisticaldisclosure control for mixed-mode data using asequence of generalised linear models

机译：缺失数据和统计的多重插补使用a的混合模式数据的公开控制广义线性模型的序列

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multiple imputation is a commonly used approach to deal with missing data and to protect confidentiality of public use data sets. The basic idea is to replace the missing values or sensitive values with multiple imputation, and we then release the multiply imputed data sets to the public. Users can analyze the multiply imputed data sets and obtain valid inferences by using simple combining rules, which take the uncertainty due to the presence of missing values and synthetic values into account. It is crucial that imputations are drawn from the posterior predictive distribution to preserve relationships present in the data and allow valid conclusions to be made from any analysis. In data sets with different types of variables, e.g. some categorical and some continuous variables, multivariate imputation by chained equations (MICE) (Van Buuren (2011)) is a commonly used multiple imputation method. However, imputations from such an approach are not necessarily drawn from a proper posterior predictive distribution. We propose a method, called factored regression model (FRM) to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models.We use data augmentation methods to connect the categorical and continuous variables and this allows us to draw imputations from a proper posterior distribution. We compare the performance of our method with MICE using simulation studies and on a breastfeeding data. We also extend our modelling strategies to incorporate different informative priors for the FRM to explore robust regression modelling and the sparse relationships between the predictors. We then apply our model to protect confidentiality of the current population survey (CPS) data by generating multiply imputed, partially synthetic data sets. These data sets comprise a mix of original data and the synthetic data where values chosen for synthesis are based on an approach that considers unique and sensitive units in the survey. Valid inference can then be made using the combining rules described by Reiter (2003). An extension to the modelling strategy is also introduced to deal with the presence of spikes at zero in some of the continuous variables in the CPS data.

机译：多重插补是处理丢失数据和保护公用数据集机密性的常用方法。基本思想是用多个插补替换缺失值或敏感值，然后将公开的乘插补数据集发布给公众。用户可以使用简单的组合规则来分析乘数估算数据集并获得有效的推论，该规则考虑了由于缺少值和综合值而导致的不确定性。从后验预测分布中得出推论以保持数据中存在的关系并允许从任何分析中得出有效的结论，这一点至关重要。在具有不同类型变量的数据集中，例如一些分类变量和一些连续变量，通过链式方程进行的多元插补（MICE）（Van Buuren（2011））是一种常用的多重插补方法。但是，从这种方法得出的推论不一定是从适当的后验预测分布中得出的。我们提出了一种称为因数回归模型（FRM）的方法，该方法通过使用一系列广义线性模型对数据中变量的联合分布进行建模来乘以此类数据集中的归因缺失值。连续变量，这使我们能够从适当的后验分布中得出估算值。我们使用模拟研究和母乳喂养数据，将我们的方法与MICE的性能进行了比较。我们还扩展了建模策略，以结合FRM的各种先验知识，以探索稳健的回归建模和预测变量之间的稀疏关系。然后，我们通过生成多个估算的部分合成数据集来应用我们的模型来保护当前人口调查（CPS）数据的机密性。这些数据集包括原始数据和合成数据的混合，其中选择用于合成的值是基于一种考虑调查中唯一且敏感的单位的方法。然后可以使用Reiter（2003）描述的合并规则进行有效推断。还引入了对建模策略的扩展，以处理CPS数据中某些连续变量中零尖峰的存在。

著录项

作者
Lee Min Cherng;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Combining multiple imputation and control function methods to deal with missing data and endogeneity in discrete-choice models [J] . Gopalakrishnan Raja, Guevara C. Angelo, Ben-Akiva Moshe Transportation Research Part B: Methodological . 2020,第Deca期

机译：结合多重估算和控制功能方法来处理离散选择模型中的缺失数据和内能性
2. Imputation of missing variance data using non-linear mixed effects modelling to enable an inverse variance weighted meta-analysis of summary-level longitudinal data: A case study [J] . BoucherM. Pharmaceutical statistics. . 2012,第4期

机译：使用非线性混合效应模型对缺失方差数据进行插补，以对摘要级纵向数据进行逆方差加权元分析：一个案例研究
3. Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable [J] . Ibrahim JG., Lipsitz SR., Chen MH. Biometrika . 2001,第2期

机译：当缺失数据机制不可忽略时，广义线性混合模型中的缺失响应
4. AUGMENTED STOCHASTIC MULTIPLE IMPUTATION MODEL FOR AIRPORT PAVEMENT MISSING DATA IMPUTATION [C] . J. Farhan, T. F. Fwa Annual meeting of the transportation research board;Transportation Research Board . 2014

机译：用于机场铺面缺失数据插补的增强随机多插补模型
5. The Robustness of Multilevel Multiple Imputation for Handling Missing Data in Hierarchical Linear Models [D] . Medhanie, Amanuel Gebri. 2013

机译：分层线性模型中处理缺失数据的多级多重插补的鲁棒性
6. Universal Linear Fit Identification: A Method Independent of Data Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation [O] . K. K. L. B. Adikaram, M. A. Hussein, M. Effenberger, -1

机译：通用线性拟合识别：一种独立于数据离群值和噪声分布模型且无缺失或缺失数据插补的方法
7. Multiple imputation for missing binary item scores in multilevel cross-classified educational data when the Analysis and Imputation models differ [O] . Kadengye Damazo Twebaze, Ceulemans Eva, Van Den Noortgate Wim 2012

机译：当分析和插补模型不同时，对多层次交叉分类教育数据中缺少二元项目得分的插补

Multiple imputation for missing data and statisticaldisclosure control for mixed-mode data using asequence of generalised linear models

摘要

著录项

相似文献

相关主题

期刊订阅