...
首页> 外文期刊>Statistics and computing >Imputation and low-rank estimation with Missing Not At Random data
【24h】

Imputation and low-rank estimation with Missing Not At Random data

机译:丢失和低秩估计与随机数据缺失

获取原文
获取原文并翻译 | 示例

摘要

Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data. Our first contribution is to suggest a model-based estimation strategy by modelling the missing mechanism distribution. An EM algorithm is then implemented, involving a Fast Iterative Soft-Thresholding Algorithm (FISTA). Our second contribution is to suggest a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and the missing mechanism: the data matrix is concatenated with the mask coding for the missing values; a low-rank structure for exponential family is assumed on this new matrix, in order to encode links between variables and missing mechanisms. The methodology that has the great advantage of handling different missing value mechanisms is robust to model specification errors. The performances of our methods are assessed on the real data collected from a trauma registry (TraumaBase (R)) containing clinical information about over twenty thousand severely traumatized patients in France. The aim is then to predict if the doctors should administrate tranexomic acid to patients with traumatic brain injury, that would limit excessive bleeding.
机译:缺少值挑战数据分析,因为许多监督和无监督的学习方法不能直接应用于不完整的数据。基于低秩假设的矩阵完成是处理缺失值的非常强大的解决方案。但是,现有方法不考虑在实践中广泛遇到的信息缺失值的情况。本文提出了矩阵完成方法,以恢复不随机(MNAR)数据遗失。我们的第一款贡献是通过建模缺失机制分布来建议基于模型的估计策略。然后实施EM算法,涉及快速迭代软阈值算法(Fista)。我们的第二款贡献是通过隐式考虑数据和缺失机制的联合分布来建议计算有效的代理估计:数据矩阵与丢失值的掩码编码兼容。在该新矩阵上假设指数族的低级结构,以便在变量和丢失机制之间编码链路。处理不同缺失值机制的巨大优点的方法是稳健的模拟规范错误。在从创伤注册表(Raumabase(R))收集的真实数据中评估了我们的方法的性能,含有关于法国在法国的2万个严重创伤患者的临床信息。然后,目的是预测医生应施用雌雄酸的患者创伤性脑损伤,这将限制过度出血。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号