首页> 外文会议>IEEE International Conference on Big Knowledge >Don't Do Imputation: Dealing with Informative Missing Values in EHR Data Analysis
【24h】

Don't Do Imputation: Dealing with Informative Missing Values in EHR Data Analysis

机译:不要插补:在EHR数据分析中处理信息性缺失值

获取原文

摘要

Missing values pose a significant challenge in data analytic, especially in clinical studies, data is typically missing-not-at-random (MNAR). Applying techniques (e.g. imputations) that were designed for missing-at-random (MAR) to MNAR data, can lead to biases. In this work, we propose pattern-wise analysis, a collection of methods for building predictive models in the presence of MNAR missing values. On a per-pattern basis, this methodology constructs an individual model for each missingness pattern. We show that even the simplest pattern-wise method, Per-Pattern Modeling (PPM) outperforms models built on data sets completed by the most popular imputation methods. PPM faces difficulty when the number of missingness patterns is too high or when the missingness patterns have too few observations. We developed variants of PPM to overcome these challenges from three complementary perspectives: (i) from a model selection perspective, where PPM can select patterns to build models; (ii) a distributional perspective, where the training data set is expanded in a distribution-preserving fashion; and (iii) from a causal perspective, where a causal structure for the MNAR mechanism is assumed and exploited to convert the problem from MNAR to MAR. Evaluation of the proposed methods on both synthetic MNAR data and a real-world clinical data set of sepsis patients shows notable improvement over traditional approaches.
机译:缺失值对数据分析提出了重大挑战,尤其是在临床研究中,数据通常是“随机缺失”(MNAR)。将专为随机缺失(MAR)设计的技术(例如归因)应用于MNAR数据可能会导致偏差。在这项工作中,我们提出了模式明智的分析,这是在存在MNAR缺失值的情况下建立预测模型的方法的集合。在每个模式的基础上,该方法为每个缺失模式构建一个单独的模型。我们证明,即使是最简单的模式化方法,每模式建模(PPM)的性能也要优于基于最流行的插补方法完成的数据集所建立的模型。当缺失模式的数量太多或缺失模式的观测值太少时,PPM会面临困难。我们开发了PPM的变体,以从三个互补的角度克服了这些挑战:(i)从模型选择的角度,PPM可以选择模式来构建模型; (ii)从分布的角度来看,以保持分布的方式扩展训练数据集; (iii)从因果关系的角度出发,其中假定并利用了MNAR机制的因果结构将问题从MNAR转换为MAR。对脓毒症患者的合成MNAR数据和真实临床数据集进行的拟议方法评估显示,与传统方法相比有显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号