Predicting Missing Values in Medical Data Via XGBoost Regression

X. Zhang; C. Yan; C. GaoB.A. MalinY. Chen

首页> 外文期刊>Journal of healthcare informatics research. >Predicting Missing Values in Medical Data Via XGBoost Regression

【24h】

Predicting Missing Values in Medical Data Via XGBoost Regression

机译：Predicting Missing Values in Medical Data Via XGBoost Regression

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

The data in a patient's laboratory test result is a notable resource to support clinical investigation and enhance medical research. However, for a variety of reasons, this type of data often contains a non-trivial number of missing values. For example, physicians may neglect to order tests or document the results. Such a phenomenon reduces the degree to which this data can be utilized to learn efficient and effective predictive models. To address this problem, various approaches have been developed to impute missing laboratory values; however, their performance has been limited. This is due, in part, to the fact no approaches effectively leverage the contextual information (1) in individual or (2) between laboratory test variables. We introduce an approach to combine an unsupervised prefilling strategy with a supervised machine learning approach, in the form of extreme gradient boosting (XGBoost), to leverage both types of context for imputation purposes. We evaluated the methodology through a series of experiments on approximately 8200 patients' records in the MIMIC-DI dataset. The results demonstrate that the new model outperforms baseline and state-of-the-art models on 13 commonly collected laboratory test variables. In terms of the normalized root mean square derivation (nRMSD), our model exhibits an imputation improvement by over 20%, on average. Missing data imputation on the temporal variables can be largely improved via prefilling strategy and the supervised training technique, which leverages both the longitudinal and cross-sectional context simultaneously.

著录项

来源
《Journal of healthcare informatics research.》 |2020年第4期|383-394|共12页
作者
X. Zhang; C. Yan; C. GaoB.A. MalinY. Chen;
展开▼
作者单位

Vanderbilt University, Nashville, TN, USA;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类医学与其他学科的关系;
关键词
Missing values; Imputation; XGBoost; Laboratory tests;

Predicting Missing Values in Medical Data Via XGBoost Regression

摘要

著录项

相关主题

期刊订阅