首页> 外文期刊>American journal of public health >Correction of Selection Bias in Survey Data: Is the Statistical Cure Worse Than the Bias?
【24h】

Correction of Selection Bias in Survey Data: Is the Statistical Cure Worse Than the Bias?

机译:校正调查数据中的选择偏差:统计偏差是否比偏差更严重?

获取原文
           

摘要

In previous articles in the American Journal of Epidemiology ( Am J Epidemiol . 2013;177(5):431–442) and American Journal of Public Health ( Am J Public Health . 2013;103(10):1895–1901), Masters et al. reported age-specific hazard ratios for the contrasts in mortality rates between obesity categories. They corrected the observed hazard ratios for selection bias caused by what they postulated was the nonrepresentativeness of the participants in the National Health Interview Study that increased with age, obesity, and ill health. However, it is possible that their regression approach to remove the alleged bias has not produced, and in general cannot produce, sensible hazard ratio estimates. First, we must consider how many nonparticipants there might have been in each category of obesity and of age at entry and how much higher the mortality rates would have to be in nonparticipants than in participants in these same categories. What plausible set of numerical values would convert the (“biased”) decreasing-with-age hazard ratios seen in the data into the (“unbiased”) increasing-with-age ratios that they computed? Can these values be encapsulated in (and can sensible values be recovered from) one additional internal variable in a regression model? Second, one must examine the age pattern of the hazard ratios that have been adjusted for selection. Without the correction, the hazard ratios are attenuated with increasing age. With it, the hazard ratios at older ages are considerably higher, but those at younger ages are well below one. Third, one must test whether the regression approach suggested by Masters et al. would correct the nonrepresentativeness that increased with age and ill health that I introduced into real and hypothetical data sets. I found that the approach did not recover the hazard ratio patterns present in the unselected data sets: the corrections overshot the target at older ages and undershot it at lower ages. This article was jointly published in the American Journal of Epidemiology ( Am J Epidemiol . 2017;185(6):409–411) and the American Journal of Public Health ( Am J Pub Health . 2017;107(4):503–505). The Editors of the American Journal of Public Health ( AJPH ) and the American Journal of Epidemiology ( AJE ) asked me to comment on hazard ratio calculations in two articles by Masters et al. 1,2 At issue are age-specific hazard ratios for the differences in mortality rates between obesity categories. The hazard ratio estimates in the AJE article 2 were derived using Cox regression models that allowed for different hazard ratios for different age intervals; those in the subsequent AJPH article 1 were obtained using smooth-in-age hazard ratio models. These estimates deserve scrutiny because they were derived from a statistical approach that the authors claimed corrected the observed hazard ratios for selection bias, 3 that is, a distortion caused by what they postulated was the nonrepresentativeness of the participants in the National Health Interview Study (NHIS) that increased with age, obesity, and ill health. In the AJPH article, the authors used these corrected hazard ratios to arrive at population attributable fractions (a secondary topic that is addressed in Appendices 1 and 2, available as supplements to the online version of this article at http://www.ajph.org ). The way Masters et al. arrived at the age-specific hazard ratios was addressed in their response to a letter to the editor in the AJE 4 and was also addressed in an unanswered letter to the editor in the AJPH . 5 Appendix 3 and Figure A (available as supplements to the online version of this article at http://www.ajph.org ) show my simple reanalyses of some of the NHIS data and the age-specific hazard ratios (which were very similar to those of Masters et al.) that I obtained using their correction for selection bias. However, it is possible that the way they removed the alleged bias has not produced, and in general cannot produce, sensible hazard ratio estimates. Masters et al. postulated that the nonrepresentativeness of the NHIS participants, which increased with age and ill health, distorted the age-specific mortality gradients across the body mass index (BMI) categories: Participants who were older at entry and in the higher-risk BMI categories would have been healthier than their same-age counterparts who did not participate. Ideally, to directly and accurately adjust the observed age-specific hazard ratios for this discrepancy, one would (1) add appropriate (specific to age at entry) numbers of nonparticipants to each of the BMI categories, (2) assign them higher mortality rates than those seen in persons in the same category who did participate, and (3) calculate a new (higher) rate for each category. It is difficult to see how estimates of these additional (age-at-entry–specific sets of) quantities implicated in the selection bias can be extracted from the NHIS data simply by using a regression model. Masters et a
机译:在《美国流行病学杂志》(Am J Epidemiol.2013; 177(5):431–442)和《美国公共卫生杂志》(Am J PublicHealth.2013; 103(10):1895-1901)的先前文章中,硕士等。报告了针对肥胖类别之间死亡率差异的特定年龄危害比。他们纠正了观察到的选择偏见的危险比,这些假设是由他们假定的原因引起的,即随着年龄,肥胖和健康状况的恶化,国家健康访问研究中参与者的代表性不足。但是,有可能他们的回归方法无法消除所谓的偏见,而且通常不会产生合理的危险比估算值。首先,我们必须考虑肥胖和进入年龄的每个类别中可能有多少未参加者,并且与这些类别的参加者相比,未参加者的死亡率要高多少。哪些合理的数值集可以将数据中看到的(“有偏见的”)随年龄增长的危险比率转换为他们计算出的(“无偏见的”)随年龄增长的比率?可以将这些值封装在回归模型中的一个附加内部变量中(并从中恢复明智的值)吗?其次,必须检查已针对选择进行调整的危险比的年龄模式。如果不进行更正,则危险比会随着年龄的增长而减弱。有了它,高龄人群的危险比就更高了,但是低龄人群的危险比却大大低于1。第三,必须测试Masters等人建议的回归方法。可以纠正我在真实和假设数据集中引入的,随着年龄和健康状况而增加的非代表性。我发现该方法无法恢复未选择的数据集中存在的危险比模式:校正值在较高年龄时超过目标,而在较低年龄时则低于目标。本文共同发表在《美国流行病学杂志》(Am J Epidemiol.2017; 185(6):409-411)和《美国公共卫生杂志》(Am J Pub Health.2017; 107(4):503-505 )。 “美国公共卫生杂志”(AJPH)和“美国流行病学杂志”(AJE)的编辑要求我在Masters等人的两篇文章中评论危险比的计算。 1,2问题在于肥胖类别之间死亡率差异的特定年龄危害比。 AJE第2条中的危险比估算值是使用Cox回归模型得出的,该模型允许在不同年龄段使用不同的危险比。随后的AJPH文章1中的数据是使用年龄平滑风险比模型获得的。这些估计值值得仔细研究,因为它们来自一种统计方法,作者声称纠正了选择偏见的观察到的危险比,3也就是说,他们假设的失真是国家健康访问研究(NHIS)中参与者的代表性不足。 )随着年龄,肥胖和身体状况的恶化而增加。在AJPH文章中,作者使用这些校正后的危险比得出了人口可归因的分数(附录1和2中涉及的次要主题,可以作为本文在线版本的补充,位于http://www.ajph。 org)。大师等人的方式。他们在回应AJE 4给编辑的信中提到了针对特定年龄段的危险比率,并在给AJPH给编辑的未答复信中谈到了这一比例。 5附录3和图A(作为本文在线版本的补充,可在http://www.ajph.org上获得)显示了我对某些NHIS数据和特定年龄段的危险比的简单重新分析(非常相似) (对于Masters等人),我使用他们对选择偏倚的修正得出。但是,他们消除所谓的偏见的方式可能尚未产生,并且通常无法产生合理的危险比估算值。大师等。假设NHIS参与者的代表性不足,且随着年龄和健康状况的增加而增加,从而扭曲了整个BMI类别的特定年龄死亡率梯度:进入年龄较大和BMI较高风险类别的参与者比没有参加比赛的同龄人健康。理想情况下,要针对这种差异直接准确地调整观察到的特定年龄段的危险比,可以(1)在每个BMI类别中添加适当的(特定于进入时的年龄)非参与者数量,(2)为他们分配较高的死亡率而不是与参加同一类别的人见过的人相比,(3)为每个类别计算新的(更高)比率。很难看到仅通过使用回归模型就可以从NHIS数据中提取出与选择偏差有关的这些额外的(特定于年龄的特定数量)数量的估计。大师等

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号