首页> 外文会议>Advances in mathematical and computational methods >A new method of multiple imputation for completely (or almost completely) missing data
【24h】

A new method of multiple imputation for completely (or almost completely) missing data

机译:一种完全(或几乎完全)丢失数据的多重插补新方法

获取原文
获取原文并翻译 | 示例

摘要

One of the important questions the researcher must answer assessing data quality while preparing information for a data mining procedure is whether missing observations in the dataset are missing at random, and whether some form of imputation is needed. If all (or almost all) observations of a variable are missing, they cannot be classified as missing at random. Therefore, most known methods of imputation of missing values cannot be applied to this variable. This paper studies a particular way for creating imputations in datasets containing completely (or almost completely) missing variables. As it is shown in the paper, if no external data are available, the maximum entropy distribution is the only reasonable probability distribution for producing proper imputation in case of such variables. Two examples of real-life epidemiological studies demonstrate this approach.
机译:研究人员在为数据挖掘程序准备信息时必须回答的评估数据质量的重要问题之一是,是否随机丢失了数据集中的缺失观测值,以及是否需要某种形式的插补。如果缺少变量的所有(或几乎所有)观察值,则不能将它们随机分类为缺失。因此,大多数估算缺失值的方法无法应用于此变量。本文研究了一种在包含完全(或几乎完全)缺失变量的数据集中创建插补的特殊方法。如本文所示,如果没有外部数据可用,则在这种变量的情况下,最大熵分布是产生适当推算的唯一合理概率分布。现实生活中的流行病学研究的两个例子证明了这种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号