首页> 外文OA文献 >Comparing Bootstrap and Jackknife Variance Estimation Methods for Area Under the ROC Curve Using One-Stage Cluster Survey Data
【2h】

Comparing Bootstrap and Jackknife Variance Estimation Methods for Area Under the ROC Curve Using One-Stage Cluster Survey Data

机译:基于一阶段聚类调查数据的ROC曲线下面积的Bootstrap和Jackknife方差估计方法比较

摘要

The purpose of this research is to examine the bootstrap and jackknife as methods for estimating the variance of the AUC from a study using a complex sampling design and to determine which characteristics of the sampling design effects this estimation. Data from a one-stage cluster sampling design of 10 clusters was examined. Factors included three true AUCs (.60, .75, and .90), three prevalence levels (50/50, 70/30, 90/10) (non-disease/disease), and finally three number of clusters sampled (2, 5, or 7). A simulated sample was constructed for each of the 27 combinations of AUC, prevalence and number of clusters. Estimates of the AUC obtained from both the bootstrap and jackknife methods provide unbiased estimates for the AUC. In general it was found that bootstrap variance estimation methods provided smaller variance estimates. For both the bootstrap and jackknife variance estimates, the rarer the disease in the population the higher the variance estimate. As the true area increased the variance estimate decreased for both the bootstrap and jackknife methods. For both the bootstrap and jackknife variance estimates, as number of clusters sampled increased the variance decreased, however the trend for the jackknife may be effected by outliers. The National Health and Nutrition Examination Survey (NHANES) conducted by the CDC is a complex survey which implements the use of the one-stage cluster sampling design. A subset of the 2001-2002 NHANES data was created looking only at adult women. A separate logistic regression analysis was conducted to determine if exposure to certain furans in the environment have an effect on abnormal levels of four hormones (FSH, LH, TSH, and T4) in women. Bootstrap and jackknife variance estimation techniques were applied to estimate the AUC and variances for the four logistic regressions. The AUC estimates provided by both the bootstrap and jackknife methods were similar, with the exception of LH. Unlike in the simulated study, the jackknife variance estimation method provided consistently smaller variance estimates than bootstrap. AUC estimates for all four hormones suggested that exposure to furans effects abnormal levels of hormones more than expected by chance. The bootstrap variance estimation technique provided better variance estimates for AUC when sampling many clusters. When only sampling a few clusters or as in the NHANES study where the entire population was treated as a single cluster, the jackknife variance estimation method provides smaller variance estimates for the AUC.
机译:这项研究的目的是检查引导程序和折刀,作为从使用复杂采样设计的研究中估算AUC差异的方法,并确定采样设计的哪些特征会影响该估算。检查了来自10个聚类的一级聚类抽样设计的数据。影响因素包括三个真实的AUC(.60,.75和.90),三个患病率水平(50 / 50、70 / 30、90 / 10)(非疾病/疾病),最后三个样本被抽样(2 ,5或7)。针对AUC的27种组合,患病率和簇数构建了一个模拟样本。从引导法和折刀法获得的AUC估计值均提供了AUC的无偏估计值。通常,发现自举方差估计方法提供的方差估计较小。对于引导程序和折刀方差估计值,人群中疾病越罕见,方差估计值越高。随着真实面积的增加,自举法和折刀法的方差估计都减小了。对于自举和折刀方差估计,随着采样簇数的增加,方差减小,但是折刀的趋势可能受到异常值的影响。疾病预防控制中心(CDC)进行的全国健康与营养检查调查(NHANES)是一项复杂的调查,它采用了一级抽样调查设计。仅针对成年女性创建了2001-2002 NHANES数据的子集。进行了单独的逻辑回归分析,以确定暴露于环境中的某些呋喃是否对女性四种激素(FSH,LH,TSH和T4)的异常水平有影响。使用自举和折刀方差估计技术来估计四个逻辑回归的AUC和方差。引导法和折刀法提供的AUC估计值相似,但LH除外。与模拟研究不同,折刀方差估计方法始终提供比自举更小的方差估计。 AUC对所有四种激素的估计表明,呋喃暴露对激素异常水平的影响比偶然预期的要大。引导方差估计技术在对许多群集进行采样时为AUC提供了更好的方差估计。当仅对几个聚类进行采样时,或者像在NHANES研究中那样,将整个人口视为一个聚类,折刀方差估计方法将为AUC提供较小的方差估计。

著录项

  • 作者

    Dunning Allison;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号