首页> 外文学位 >Regression diagnostics for complex survey data: Identification of influential observations.
【24h】

Regression diagnostics for complex survey data: Identification of influential observations.

机译:复杂调查数据的回归诊断:确定有影响的观测值。

获取原文
获取原文并翻译 | 示例

摘要

Discussion of diagnostics for linear regression models have become indispensable chapters or sections in most of the statistical textbooks. However, survey literature has not given much attention to this problem. Examples from real surveys show that sometimes the inclusion and exclusion of a small number of the sampled units can greatly change the regression parameter estimates, which indicates that techniques of identifying the influential units are necessary. The goal of this research is to extend and adapt the conventional ordinary least squares influence diagnostics to complex survey data, and determine how they should be justified.;We assume that an analyst is looking for a linear regression model that fits reasonably well for the bulk of the finite population and chooses to use the survey weighted regression estimator. Diagnostic statistics such as DFBETAS, DFFITS, and modified Cook's Distance are constructed to evaluate the effect on the regression coefficients of deleting a single observation. As components of the diagnostic statistics, the estimated variances of the coefficients are obtained from design-consistent estimators which account for complex design features, e.g. clustering and stratification. For survey data, sample weights, which are computed with the primary goal of estimating finite population statistics, are sources of influence besides the response variable and the predictor variables, and therefore need to be incorporated into influence measurement. The forward search method is also adapted to identify influential observations as a group when there is possible masked effect among the outlying observations.;Two case studies and simulations are done in this dissertation to test the performance of the adapted diagnostic statistics. We reach the conclusion that removing the identified influential observations from the model fitting can obtain less biased estimated coefficients. The standard errors of the coefficients may be underestimated since the variation in the number of observations used in the regressions was not accounted for.
机译:线性回归模型的诊断学讨论已成为大多数统计教科书中必不可少的章节。但是,调查文献并未对此问题给予太多关注。实际调查中的示例表明,有时包含和排除少量采样单位可以极大地改变回归参数估计值,这表明需要使用识别有影响力的单位的技术。这项研究的目的是将常规的普通最小二乘法影响诊断扩展并适用于复杂的调查数据,并确定应如何证明其合理性;我们假设分析师正在寻找一个适合大部分数据的线性回归模型的有限人口,并选择使用调查加权回归估计量。诊断统计数据(例如DFBETAS,DFFITS和修改的Cook's Distance)可以评估删除单个观测值对回归系数的影响。作为诊断统计的组成部分,系数的估计方差是从设计一致的估算器中获得的,这些估算器考虑了复杂的设计特征,例如聚类和分层。对于调查数据,以估计有限总体统计量为主要目标而计算的样本权重是影响变量,除了响应变量和预测变量之外,因此需要将其纳入影响度量中。当外围观测值中可能存在掩盖效应时,正向搜索方法也适用于将有影响的观测值识别为一组。本论文进行了两个案例研究和模拟,以测试自适应诊断统计数据的性能。我们得出的结论是,从模型拟合中删除已确定的有影响力的观察值可获得较少的有偏估计系数。系数的标准误差可能会被低估,因为未考虑回归中使用的观察值数量的变化。

著录项

  • 作者

    Li, Jianzhu.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 137 p.
  • 总页数 137
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号