首页> 外文学位 >Regression diagnostics for complex survey data: Identification of influential observations.

【24h】

Regression diagnostics for complex survey data: Identification of influential observations.

机译：复杂调查数据的回归诊断：确定有影响的观测值。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Discussion of diagnostics for linear regression models have become indispensable chapters or sections in most of the statistical textbooks. However, survey literature has not given much attention to this problem. Examples from real surveys show that sometimes the inclusion and exclusion of a small number of the sampled units can greatly change the regression parameter estimates, which indicates that techniques of identifying the influential units are necessary. The goal of this research is to extend and adapt the conventional ordinary least squares influence diagnostics to complex survey data, and determine how they should be justified.;We assume that an analyst is looking for a linear regression model that fits reasonably well for the bulk of the finite population and chooses to use the survey weighted regression estimator. Diagnostic statistics such as DFBETAS, DFFITS, and modified Cook's Distance are constructed to evaluate the effect on the regression coefficients of deleting a single observation. As components of the diagnostic statistics, the estimated variances of the coefficients are obtained from design-consistent estimators which account for complex design features, e.g. clustering and stratification. For survey data, sample weights, which are computed with the primary goal of estimating finite population statistics, are sources of influence besides the response variable and the predictor variables, and therefore need to be incorporated into influence measurement. The forward search method is also adapted to identify influential observations as a group when there is possible masked effect among the outlying observations.;Two case studies and simulations are done in this dissertation to test the performance of the adapted diagnostic statistics. We reach the conclusion that removing the identified influential observations from the model fitting can obtain less biased estimated coefficients. The standard errors of the coefficients may be underestimated since the variation in the number of observations used in the regressions was not accounted for.

机译：线性回归模型的诊断学讨论已成为大多数统计教科书中必不可少的章节。但是，调查文献并未对此问题给予太多关注。实际调查中的示例表明，有时包含和排除少量采样单位可以极大地改变回归参数估计值，这表明需要使用识别有影响力的单位的技术。这项研究的目的是将常规的普通最小二乘法影响诊断扩展并适用于复杂的调查数据，并确定应如何证明其合理性;我们假设分析师正在寻找一个适合大部分数据的线性回归模型的有限人口，并选择使用调查加权回归估计量。诊断统计数据（例如DFBETAS，DFFITS和修改的Cook's Distance）可以评估删除单个观测值对回归系数的影响。作为诊断统计的组成部分，系数的估计方差是从设计一致的估算器中获得的，这些估算器考虑了复杂的设计特征，例如聚类和分层。对于调查数据，以估计有限总体统计量为主要目标而计算的样本权重是影响变量，除了响应变量和预测变量之外，因此需要将其纳入影响度量中。当外围观测值中可能存在掩盖效应时，正向搜索方法也适用于将有影响的观测值识别为一组。本论文进行了两个案例研究和模拟，以测试自适应诊断统计数据的性能。我们得出的结论是，从模型拟合中删除已确定的有影响力的观察值可获得较少的有偏估计系数。系数的标准误差可能会被低估，因为未考虑回归中使用的观察值数量的变化。

著录项

作者
Li, Jianzhu.;
展开▼
作者单位

University of Maryland, College Park.;

展开▼
授予单位 University of Maryland, College Park.;
学科 Statistics.
学位 Ph.D.
年度 2007
页码 137 p.
总页数 137
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Identification of outlying and influential data with principal components regression estimation in binary logistic regression [J] . Ozkale M. Revan Communications in Statistics . 2021,第1a3期

机译：用二进制物流回归中的主要成分回归估计识别偏远和有影响的数据
2. A diagnostic tool for regression analysis of complex survey data [J] . Wang Zilin, Bellhouse David Statistical papers . 2015,第4期

机译：用于对复杂调查数据进行回归分析的诊断工具
3. A comparison of various influential points diagnostic methods and robust regression approaches: Reanalysis of interstitial lung disease data [J] . A. Bagheri, H. Midi, M. Ganjali, Applied mathematical sciences . 2010,第25a28期

机译：各种影响点诊断方法和稳健回归方法的比较：间质性肺疾病数据的重新分析
4. Analysis of intra-level isolation test structure data by multiple regression facilitate rule identification for diagnostic expert systems [C] . Freidhoff, C.B., Cresswell, . 1989

机译：通过多重回归分析内部隔离测试结构数据有助于诊断专家系统的规则识别
5. Collinearity diagnostics for complex survey data. [D] . Liao, Dan. 2010

机译：共线性诊断用于复杂的调查数据。
6. Using mixed effects logistic regression models for complex survey data on malaria rapid diagnostic test results [O] . Chigozie Louisa J. Ugwu, Temesgen T. Zewotir 2018

机译：使用混合效应Logistic回归模型获得有关疟疾快速诊断测试结果的复杂调查数据
7. Using mixed effects logistic regression models for complex survey data on malaria rapid diagnostic test results [O] . Chigozie Louisa J. Ugwu, Temesgen T. Zewotir 2018

机译：使用混合效应逻辑测量模型进行复杂调查数据疟疾快速诊断测试结果

Regression diagnostics for complex survey data: Identification of influential observations.

摘要

著录项

相似文献

相关主题

期刊订阅