首页> 外文学位 >Bayesian and frequentist cross-validation methods for explanatory item response models
【24h】

Bayesian and frequentist cross-validation methods for explanatory item response models

机译:贝叶斯和常识交叉验证方法用于解释性项目反应模型

获取原文
获取原文并翻译 | 示例

摘要

The chapters of this dissertation are intended to be three independent, publishable papers, but they nevertheless share the theme of predictive inferences for explanatory item models. Chapter 1 describes the differences between the Bayesian and frequentist statistical frameworks in the context of explanatory item response models. The particular model of focus, the "doubly explanatory model", is a model for dichotomous item responses that includes covariates for person ability and covariates for item difficulty. It includes many Rasch-family models as special cases. Differences in how the model is understood and specified within the two frameworks are discussed. The various predictive inferences available from the model are defined for the two frameworks.;Chapter 2 is situated in the frequentist framework and focuses on approaches for explaining or predicting the difficulties of items. Within the frequentist framework, the linear logistic test model (LLTM) is likely to be used for this purpose, which in essence regresses item difficulty on covariates for characteristics of the items. However, this regression does not include an error term, and so the model is in general misspecified. Meanwhile, adding an error term to the LLTM makes maximum likelihood estimation infeasible. To address this problem, a two-stage modeling strategy (LLTM-E2S) is proposed: in the first stage Rasch model maximum likelihood estimates for item difficulties and standard errors are obtained, and in the second stage a random effects meta-analysis regression of the Rasch difficulties on covariates is performed that incorporates the uncertainty in the item difficulty estimates. In addition, holdout validation, cross-validation, and Akaike information criteria (AIC) are discussed as means of comparing models that have different sets of item predictors. I argue that AIC used with the LLTM estimates the expected deviance of the fitted model when applied to new observations from the same sample of items and persons, which is unsuitable for assessing the ability of the model to predict item difficulties. On the other hand, AIC applied to the LLTM-E2S provides the expected deviance related to new observations arising from new items, which is what is needed. A simulation study compares parameter recovery and model comparison results for the two modeling strategies.;Chapter 3 takes a Bayesian outlook and focuses on models that explain or predict person abilities. I argue that the usual application of Bayesian forms of information criteria to these models yields the wrong inference. Specifically, when using likelihoods that are conditional on person ability, information criteria estimate the expected fit of the model to new data arising from the same persons. What are needed are likelihoods that are marginal over the distribution for ability, which may be used with information criteria to estimate the expected fit to new data from a new sample of persons. The widely applicable information criterion (WAIC), Pareto-smoothed importance sampling approximation to leave-one-out cross-validation, and deviance information criterion (DIC) are discussed in the context of these conditional and marginal likelihoods. An adaptive quadrature scheme for use within Markov chain Monte Carlo estimation is proposed to obtain the marginal likelihoods. Also, the moving block bootstrap is investigated as a means to estimate the Monte Carlo error for Bayesian information criteria estimates. A simulation study using a linear random intercept model is conducted to assess the accuracy of the adaptive quadrature scheme and the bootstrap estimates of Monte Carlo error. These methods are then applied to an real item response dataset, demonstrating the practical difference between conditional and marginal forms of information criteria.
机译:本文的各章旨在成为三篇独立的,可发表的论文,但它们仍然具有解释性项目模型的预测推理的主题。第1章在解释性项目反应模型的背景下描述了贝叶斯统计框架和常客统计框架之间的差异。焦点的特定模型“双重解释模型”是用于二分项目响应的模型,其中包括人的能力的协变量和项目难度的协变量。它包括许多Rasch系列模型作为特例。讨论了在两个框架内如何理解和指定模型的差异。可从模型中为两个框架定义各种预测推论。第二章位于常问性框架中,重点介绍用于解释或预测项目难度的方法。在常客框架内,线性逻辑检验模型(LLTM)可能会用于此目的,从本质上来说,这可以将项目难度回归为针对项目特征的协变量。但是,此回归不包括误差项,因此通常会错误指定模型。同时,将误差项添加到LLTM使得最大似然估计不可行。为了解决这个问题,提出了一种两阶段的建模策略(LLTM-E2S):在第一阶段的Rasch模型中,获得项目难度和标准误差的最大似然估计,在第二阶段,通过随机模型的荟萃分析进行回归进行协变量的Rasch困难,将不确定性纳入项目难度估算中。此外,讨论了保留验证,交叉验证和Akaike信息标准(AIC),作为比较具有不同项目预测变量集的模型的方法。我认为与LLTM一起使用的AIC在应用于来自相同项目和人员样本的新观测值时,会估计拟合模型的预期偏差,这不适合评估模型预测项目困难的能力。另一方面,应用于LLTM-E2S的AIC提供了与新项目产生的新观测值相关的预期偏差,这是需要的。一项仿真研究比较了两种建模策略的参数恢复和模型比较结果。第三章采用贝叶斯方法,并着重于解释或预测人员能力的模型。我认为,通常将贝叶斯形式的信息准则应用于这些模型会产生错误的推断。具体来说,当使用取决于人员能力的可能性时,信息标准会估计模型对相同人员产生的新数据的预期拟合。所需要的可能性在能力分布上是微不足道的,可以与信息标准一起使用,以从新的人员样本中估计对新数据的预期拟合。在这些条件和边际可能性的背景下,讨论了广泛适用的信息标准(WAIC),帕累托平滑重要性抽样到留一法的交叉验证以及偏差信息标准(DIC)。提出了一种用于马尔可夫链蒙特卡罗估计的自适应正交方案,以获得边际似然。同样,研究了移动块自举,作为一种估计贝叶斯信息标准估计值的蒙特卡洛误差的方法。进行了使用线性随机截距模型的仿真研究,以评估自适应正交方案的精度和蒙特卡洛误差的自举估计。然后将这些方法应用于实际项目响应数据集,以证明条件和边际形式的信息标准之间的实际差异。

著录项

  • 作者

    Furr, Daniel C.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Educational tests measurements.;Statistics.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 74 p.
  • 总页数 74
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:54:30

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号