首页> 美国卫生研究院文献>International Journal of Health Policy and Management >Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example Using National Data on Drug Injection in Prisons
【2h】

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example Using National Data on Drug Injection in Prisons

机译:数据丢失模式对插补方法性能的影响:以国家毒品监狱数据为例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Background: Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. >Methods: We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation. >Results: In scenario 2, bias in estimates was low and performances of all methods for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. >Conclusion: In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data.
机译:>背景:政策制定者需要建立模型,以便能够检测出感染HIV高风险的人群。在国家数据集中经常看到不完整的记录和脏数据。缺少数据的存在挑战了模型开发的实践。几项研究表明,当丢失率适中时,插补方法的性能可以接受。在这里要解决的较少关注的问题之一是数据丢失模式的作用。 >方法:我们使用了2720名囚犯的信息。将回归模型与整体数据进行拟合得出的结果用作黄金标准。然后生成了丢失的数据,因此丢失了10%,20%和50%的数据。在方案1中,我们在一个高于黄金模型(年龄)的变量中以上述比率生成了缺失值。在方案2中,每个自变量的一小部分都退出了。根据重要变量的选择和参数估计,比较了不同事件每个变量(EPV)值下的四种插补方法。 >结果:在方案2中,估计偏差偏低,并且处理丢失数据的所有方法的性能均相似。所有缺失率的所有方法均能够检测年龄的重要性。在方案1中,估计的偏差有所增加,尤其是丢失率达到50%时。在此处,EPV为10和5时,插补方法无法捕获年龄的影响。 >结论:在方案2中,所有归因于所有缺失率的插补方法都能够检测出年龄是重要的。在方案1中并非如此。我们的结果表明,插补方法的性能取决于丢失数据的模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号