首页> 外文学位 >Essays in Cluster Sampling and Causal Inference
【24h】

Essays in Cluster Sampling and Causal Inference

机译:整群抽样和因果推断中的论文

获取原文
获取原文并翻译 | 示例

摘要

This thesis consists of three papers in applied statistics, specifically in cluster sampling, causal inference, and measurement error. The first paper studies the problem of estimating the finite population mean from a two-stage sample with unequal selection probabilies in a Bayesian framework. Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. In a two-stage cluster sampling design, clusters are first selected with probability proportional to cluster size, and units are then randomly sampled within selected clusters. Methodological challenges arise when the sizes of nonsampled cluster are unknown. We propose both nonparametric and parametric Bayesian approaches for predicting the cluster size, and we implement inference for the unknown cluster sizes simultaneously with inference for survey outcome. We implement this method in Stan and use simulation studies to compare the performance of an integrated Bayesian approach to classical methods on their frequentist properties. We then apply our propsed method to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference.;The second paper focuses on the problem of weak instrumental variables, motivated by estimating the causal effect of incarceration on recidivism. An instrument is weak when it is only weakly predictive of the treatment of interest. Given the well-known pitfalls of weak instrumental variables, we propose a method for strengthening a weak instrument. We use a matching strategy that pairs observations to be close on observed covariates but far on the instrument. This strategy strengthens the instrument, but with the tradeoff of reduced sample size. To help guide the applied researcher in selecting a match, we propose simulating the power of a sensitivity analysis and design sensitivity and using graphical methods to examine the results. We also demonstrate the use of recently developed methods for identifying effect modification, which is an interaction between a pretreatment covariate and the treatment. Larger and less variable treatment effects are less sensitive to unobserved bias, so identifying when effect modification is present and which covariates may be the source is important. We undertake our study in the context of studying the causal effect of incarceration on recividism via a natural experiment in the state of Pennsylvania, a motivating example that illustrates each component of our analysis.;The third paper considers the issue of measurement error in the context of survey sampling and hierarchical models. Researchers are often interested in studying the relationship between community-levels variables and individual outcomes. This approach often requires estimating the neighborhood-level variable of interest from the sampled households, which induces measurement error in the neighborhood-level covariate since not all households are sampled. Other times, neighborhood-level variables are not observed directly, and only a noisy proxy is available. In both cases, the observed variables may contain measurement error. Measurement error is known to attenuate the coefficient of the mismeasured variable, but it can also affect other coefficients in the model, and ignoring measurement error can lead to misleading inference. We propose a Bayesian hierarchical model that integrates an explicit model for the measurement error process along with a model for the outcome of interest for both sampling-induced measurement error and classical measurement error. Advances in Bayesian computation, specifically the development of the Stan probabilistic programming language, make the implementation of such models easy and straightforward.
机译:本文由三篇应用统计论文组成,特别是关于集群抽样,因果推论和测量误差的论文。第一篇论文研究了在贝叶斯框架下从具有不相等选择概率的两阶段样本估计有限总体均值的问题。整群抽样在调查实践中很常见,并且相应的推论主要基于设计。我们开发了用于聚类抽样的贝叶斯框架,并在结果建模中考虑了设计效果。在两阶段的群集抽样设计中,首先以与群集大小成正比的概率选择群集,然后在选定的群集内随机抽样单位。当非采样聚类的大小未知时,方法学上会出现挑战。我们提出了非参数和参数贝叶斯方法来预测聚类大小,并且我们对未知聚类大小执行推断,同时对调查结果进行推断。我们在Stan中实现了该方法,并使用模拟研究来比较综合贝叶斯方法与经典方法在其频度上的性能。然后,我们将我们提出的方法应用于脆弱家庭和儿童福利研究中,以此作为复杂调查推断的例证。第二篇论文着重探讨了弱化工具变量的问题,其动机是估计监禁对累犯的因果作用。当一种手段仅能弱预测所关注的治疗手段时,它就是弱者。鉴于众所周知的弱工具变量陷阱,我们提出了一种加强弱工具的方法。我们使用匹配策略,将观察值配对成在观察到的协变量上相近但在仪器上相距甚远。这种策略可以增强仪器的性能,但要以减少样本量为代价。为了帮助指导应用研究人员选择匹配项,我们建议模拟灵敏度分析和设计灵敏度的功能,并使用图形方法检查结果。我们还演示了使用最近开发的方法来识别效果修饰,这是预处理协变量和治疗之间的相互作用。较大且变化较小的治疗效果对未观察到的偏倚较不敏感,因此确定何时出现效果修改以及哪些协变量可能是来源很重要。我们在宾夕法尼亚州通过自然实验研究监禁对再犯的因果关系的背景下进行了这项研究,这是一个激励性的例子,说明了我们分析的每个组成部分。第三篇论文考虑了背景下的测量误差问题调查抽样和层次模型。研究人员通常对研究社区级别变量与个人结果之间的关系感兴趣。这种方法通常需要从被抽样的家庭中估算出感兴趣的邻域水平变量,这会在邻域水平协变量中引起测量误差,因为并非所有家庭都被抽样了。在其他时候,则不会直接观察到邻域级变量,只有嘈杂的代理可用。在这两种情况下,观察到的变量都可能包含测量误差。已知测量误差会削弱错误测量的变量的系数,但是它也会影响模型中的其他系数,而忽略测量误差会导致误导性推断。我们提出了一种贝叶斯分层模型,该模型集成了一个针对测量误差过程的显式模型以及一个针对采样导致的测量误差和经典测量误差的目标结果的模型。贝叶斯计算的进步,特别是Stan概率编程语言的发展,使得这种模型的实现变得简单而直接。

著录项

  • 作者

    Makela, Susanna Maria.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 169 p.
  • 总页数 169
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号