首页> 外文期刊>BMC research notes >Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
【24h】

Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients

机译:用于分析缺失数据的方法真的重要吗?中级护理患者观察研究的数据检查

获取原文
           

摘要

Background Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option—but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assuming missing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors. Results Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR and MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than 3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in all regression models also had the same direction of influence on costs. All three mechanisms of missingness were shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively. Conclusions Arbitrary selection of methods to deal with missing data should be avoided. Using extra information gathered during the data collection exercise about the cause of missingness to guide this selection would be more appropriate.
机译:背景技术数据丢失是老年人口中医疗数据集中的常见统计问题。有人认为,任意假设造成失踪的机制,因此处理这种失踪的方法不是最佳选择,但这总是正确的吗?本文探讨了当多余的信息(表明一种特定的机制负责丢失数据)被忽略并且任意选​​择处理丢失数据的方法时会发生什么。根据迄今为止在英国完成并发布的最大的IC评估,基于2,533名中间护理(IC)患者的回归模型,用于解释成本,EQ-5D和Barthel指数的差异。使用了三种处理缺失的方法,每种方法都假定造成数据丢失的机制不同:完整的案例分析(假设随机缺失完全是MCAR),多重插补(假设随机缺失是MAR)和Heckman选择模型(假设不是随机丢失-MNAR)。通过检查系数的符号以及系数和相关标准误差的大小来衡量结果的差异。结果额外的信息强烈表明,缺失的成本数据是MCAR。结果表明,基于MCAR和MAR的方法产生了相似的结果,大多数系数的大小和标准误差相差不到3.4%,而基于MNAR方法的方法具有统计学差异(最大可达730%)。所有回归模型中的重要变量对成本的影响方向也相同。三种失踪机理均被证明是造成EQ-5D和Barthel数据缺失的潜在原因。选择用于处理缺失数据的方法似乎对这些数据的结果没有显着影响,因为它们得出的结论大致相同,系数大小和标准误差的差异分别小于54%和322%。结论应避免随意选择处理丢失数据的方法。使用在数据收集过程中收集到的有关丢失原因的额外信息来指导这种选择会更合适。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号