Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

Anoop D Shah; Jonathan W Bartlett; James Carpenter; Owen Nicholas; Harry Hemingway

首页> 外文期刊>American Journal of Epidemiology >Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

【24h】

Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

机译：使用MICE插补缺失数据的随机森林插补模型和参数插补模型的比较：CALIBER研究。

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.

机译：链式方程多元估算（MICE）通常用于估算流行病学研究中的缺失数据。 “真实”归因模型可能包含默认归因模型中未包括的非线性。随机森林插补是一种机器学习技术，它可以适应非线性和相互作用，并且不需要指定特定的回归模型。我们在2个模拟研究中将参数MICE与基于随机森林的MICE算法进行了比较。第一项研究使用了从CALIBER数据库（使用链接的定制研究和电子记录; 2001-2010年的心血管疾病研究; 2001-2010年）中的10128名稳定型心绞痛患者中抽取的2,000人的1000个随机样本，并包含所有协变量的完整数据。人为地使变量“随机丢失”，并比较了使用不同插补方法获得的参数估计值的偏差和效率。两种MICE方法均产生了（log）危险比的无偏估计，但随机森林更有效且置信区间更窄。第二项研究使用模拟数据，其中部分观测变量以非线性方式依赖于完全观测变量。使用随机森林MICE减少参数估计的偏差，并且置信区间覆盖率更好。这表明，随机森林插补对于插补某些患者缺少数据的复杂流行病学数据集可能有用。

著录项

来源
《American Journal of Epidemiology》 |2014年第6期|共11页
作者
Anoop D Shah; Jonathan W Bartlett; James Carpenter; Owen Nicholas; Harry Hemingway;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类流行病学与防疫;
关键词
入库时间 2022-08-18 10:02:19

相似文献

外文文献
中文文献
专利

1. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. [J] . Anoop D Shah, Jonathan W Bartlett, James Carpenter, American Journal of Epidemiology . 2014,第6期

机译：使用MICE插补缺失数据的随机森林插补模型和参数插补模型的比较：CALIBER研究。
2. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. [J] . Anoop D Shah, Jonathan W Bartlett, James Carpenter, American Journal of Epidemiology . 2014,第6期

机译：随机森林和参数销塑造模型的比较，用于使用小鼠抵抗缺失数据：一种口径研究。
3. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes. (Special Issue: Emerging methods for handling missing data in forest ecology and management applications.) [J] . Eskelson B. N. I., Temesgen H., Hagar J. C. Forest Ecology and Management . 2012,第Null期

机译：比较估计参断密度和参断质量属性的选定参数方法和插补方法。（特刊：森林生态学和管理应用中处理缺失数据的新兴方法。）
4. Random Forest with Random Projection to Impute Missing Gene Expression Data [C] . Lovedeep Gondara IEEE International Conference on Machine Learning and Applications . 2015

机译：利用随机投影估算缺失基因表达数据的随机森林
5. Assessing if randomized treatment group should be included in the imputation model when imputing missing outcome data in randomized superiority clinical trials. [D] . Lyass, Asya. 2010

机译：在随机优势临床试验中估算缺失结果数据时，评估是否应将随机治疗组纳入估算模型。
6. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study [O] . Anoop D. Shah, Jonathan W. Bartlett, James Carpenter, -1

机译：使用MICE插补缺失数据的随机森林和参数插补模型的比较：CALIBER研究
7. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study [O] . Anoop D. Shah, Jonathan W. Bartlett, James Carpenter, 2014

机译：随机森林和参数销塑造模型的比较，用于使用小鼠抵抗缺失数据：口径研究

Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

摘要

著录项

相似文献

相关主题

期刊订阅