首页> 外文会议>eHealth Conference >Cleansing and Imputation of Body Mass Index Data and Its Impact on a Machine Learning Based Prediction Model
【24h】

Cleansing and Imputation of Body Mass Index Data and Its Impact on a Machine Learning Based Prediction Model

机译:体重指数数据的清洁和归咎及其对基于机器学习预测模型的影响

获取原文

摘要

Background: A challenge of using electronic health records for secondary analyses is data quality. Body mass index (BMI) is an important predictor for various diseases but often not documented properly. Objectives: The aim of our study is to perform data cleansing on BMI values and to find the best method for an imputation of missing values in order to increase data quality. Further, we want to assess the effect of changes in data quality on the performance of a prediction model based on machine learning. Methods: After data cleansing on BMI data, we compared machine learning methods and statistical methods in their accuracy of imputed values using the root mean square error. In a second step, we used three variations of BMI data as a training set for a model predicting the occurrence of delirium. Results: Neural network and linear regression models performed best for imputation. There were no changes in model performance for different BMI input data. Conclusion: Although data quality issues may lead to biases, it does not always affect performance of secondary analyses.
机译:背景:对二次分析使用电子健康记录的挑战是数据质量。体重指数(BMI)是各种疾病的重要预测因子,但通常没有正确记录。目标:我们的研究目的是在BMI值上进行数据清理数据,并找到缺失值归咎的最佳方法,以提高数据质量。此外,我们希望评估数据质量变化对基于机器学习预测模型的性能的影响。方法:在BMI数据上清理数据后,我们使用根均方误差比较机器学习方法和统计方法的算值的准确性。在第二步中,我们使用了三个BMI数据的变化作为预测谵妄发生的模型的培训集。结果:神经网络和线性回归模型最适合估算。不同BMI输入数据没有模型性能的变化。结论:虽然数据质量问题可能导致偏见,但它并不总是影响二次分析的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号