您现在的位置:首页>美国卫生研究院文献>Applied Psychological Measurement

期刊信息

  • 期刊名称:

    -

  • 刊频: Eight no. a year, 2008-
  • NLM标题:
  • iso缩写: -
  • ISSN: -

年度选择

更多>>

  • 排序:
  • 显示:
  • 每页:
全选(0
<9/11>
209条结果
  • 机译 存在外部共同项目的新型稳健规模转换方法
    摘要:Common items play an important role in item response theory (IRT) true score equating under the common-item nonequivalent groups design. Biased item parameter estimates due to common item outliers can lead to large errors in equated scores. Current methods used to screen for common item outliers mainly focus on the detection and elimination of those items, which may lead to inadequate content representation for the common items. To reduce the impact of inconsistency in item parameter estimates while maintaining content representativeness, the authors propose two robust scale transformation methods based on two weighting methods: the Area-Weighted method and the Least Absolute Values (LAV) method. Results from two simulation studies indicate that these robust scale transformation methods performed as well as the Stocking-Lord method in the absence of common item outliers and, more importantly, outperformed the Stocking-Lord method when a single outlying common item was simulated.
  • 机译 使用以下方法在集群随机化设计中检测干预效果二元响应的多级结构方程建模
    摘要:Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test–post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test scores that are most often used in MLM are summed item responses (or total scores). In prior research, there have been concerns regarding measurement error in the use of total scores in using MLM. To correct for measurement error in the covariate and outcome, a theoretical justification for the use of multilevel structural equation modeling (MSEM) has been established. However, MSEM for binary responses has not been widely applied to detect intervention effects (group differences) in intervention studies. In this article, the use of MSEM for intervention studies is demonstrated and the performance of MSEM is evaluated via a simulation study. Furthermore, the consequences of using MLM instead of MSEM are shown in detecting group differences. Results of the simulation study showed that MSEM performed adequately as the number of clusters, cluster size, and intraclass correlation increased and outperformed MLM for the detection of group differences.
  • 机译 从数字正确分数计算IRT分数的两种方法的比较
    摘要:Two estimates for item response theory latent trait scores (θ) based on the summed, number-correct score, X, were compared: (a) the so-called test characteristic curve (TCC) estimates, θTCC, in which the TCC is inverted so that a value of θ can be estimated directly from X and (b) the expected a posteriori—or Bayesian posterior mean—estimates, θEAP. Using data from Tenth-Grade English and Math Tests, the conditional, expected values for θTCC and θEAP (using both normal N(0, 1) and N(0, 10) priors), along with their conditional standard errors, were computed and plotted against a grid of actual θs. Under a normal N(0, 1) prior, it was found that the Bayesian θEAPs showed considerably smaller standard errors of measurement compared with the θTCCs—especially in the tails of the θ-distribution. However, the bias of the θEAPs based on the N(0, 1) prior was substantial in the extremes of the distribution of θ. The normal N(0, 10) prior for computing the θEAPs reduced their bias but increased their standard error—These were not unexpected statistical results, given the nearly universal trade-off between bias and standard error. The choice among the three summed-score θ-estimates examined here depends largely on which of the two major sources of distortion—bias versus standard error—is the more harmful.
  • 机译 少量选择的DINA模型参量
    • 作者:Koken Ozaki
    • 刊名:Applied Psychological Measurement
    • 2015年第6期
    摘要:The deterministic-input, noisy “and” gate (DINA) model can judge whether an individual examinee has mastered each skill that is needed to answer an item correctly. This information is useful for students to know their deficits and for teachers to teach effectively. The DINA model is a statistical model for binary (correct or incorrect) data. However, recently a DINA model for multiple-choice items was developed by de la Torre. The model is aimed at obtaining information about students’ skills from incorrect answers. In the present study, new DINA models for multiple-choice items are developed that need many fewer parameters while still being able to express various answering probabilities without any restrictions on the form of the Q-matrix. Simulations using a Markov chain Monte Carlo method are performed to demonstrate the efficacies of the proposed models compared with the DINA model for binary data and the model of de la Torre for multiple-choice items, if appropriate starting values are set.
  • 机译 1PL-AG IRT模型的边际最大似然估计
    摘要:Marginal maximum likelihood estimation based on the expectation–maximization algorithm (MML/EM) is developed for the one-parameter logistic model with ability-based guessing (1PL-AG) item response theory (IRT) model. The use of the MML/EM estimator is cross-validated with estimates from NLMIXED procedure (PROC NLMIXED) in Statistical Analysis System. Numerical data are provided for comparisons of results from MML/EM and PROC NLMIXED.
  • 机译 认知的聚类分析的一致性诊断
    摘要:The Asymptotic Classification Theory of Cognitive Diagnosis (ACTCD) developed by Chiu, Douglas, and Li proved that for educational test data conforming to the Deterministic Input Noisy Output “AND” gate (DINA) model, the probability that hierarchical agglomerative cluster analysis (HACA) assigns examinees to their true proficiency classes approaches 1 as the number of test items increases. This article proves that the ACTCD also covers test data conforming to the Deterministic Input Noisy Output “OR” gate (DINO) model. It also demonstrates that an extension to the statistical framework of the ACTCD, originally developed for test data conforming to the Reduced Reparameterized Unified Model or the General Diagnostic Model (a) is valid also for both the DINA model and the DINO model and (b) substantially increases the accuracy of HACA in classifying examinees when the test data conform to either of these two models.
  • 机译 标准制定的多元概化理论方法
    摘要:Generalizability theory (G theory) allows researchers to assess the many sources of variance inherent in complex standard setting procedures involving the determination of cut scores. The flexibility of G and D studies provides a way to conceptualize and quantify the results of different standard settings once the universe of admissible observations and the universe of generalization are defined. The current article applies a multivariate single-facet design for estimating standard errors of cut scores. For practical purposes, several multivariate D study designs are used to investigate what effect various panel sizes and test lengths have on the precision of the standard setting process. The current study demonstrates the advantages and usefulness of multivariate G theory in determining the accuracy of cut scores in practical applications of standard setting procedures.
  • 机译 DIMTEST的替代假设检验程序
    摘要:Many commonly used item response models make the unidimensionality assumption of a single latent trait underlying the response data. The validity of this assumption needs to be tested before these models can be applied. One option is to use Stout’s non-parametric hypothesis test of essential unidimensionality, which is operationalized in the DIMTEST procedure. Although generally successful, Type I error rates of this procedure are usually deflated in small samples and inflated in large samples, while power can be low in small samples. A possible cause for the unfavorable Type I error rates and power may be that estimates of the sampling distribution, bias, and standard error of the test statistic are not sufficiently accurate in finite samples. In this study, five alternative hypothesis testing procedures were formulated that replace the (asymptotically correct) approximations in the current DIMTEST procedure with computational alternatives. The performance of these procedures was investigated in two simulation studies. One of these alternative procedures, which uses a conditional covariance statistic directly in a bootstrap hypothesis test, exhibited better controlled Type I errors and higher power than the current DIMTEST procedure in most conditions. Averaged over all sample sizes and correlations between two underlying dimensions, power increased by 5 percentage points for simple structure and by 7 percentage points for approximate simple structure.
  • 机译 评估DINA模型的项目级拟合
    摘要:This research focuses on developing item-level fit checking procedures in the context of diagnostic classification models (DCMs), and more specifically for the “Deterministic Input; Noisy ‘And’ gate” (DINA) model. Although there is a growing body of literature discussing model fit checking methods for DCM, the item-level fit analysis is not adequately discussed in literature. This study intends to take an initiative to fill in this gap. Two approaches are proposed, one stems from classical goodness-of-fit test statistics coupled with the Expectation-Maximization algorithm for model estimation, and the other is the posterior predictive model checking (PPMC) method coupled with the Markov chain Monte Carlo estimation. For both approaches, the chi-square statistic and a power-divergence index are considered, along with Stone’s method for considering uncertainty in latent attribute estimation. A simulation study with varying manipulated factors is carried out. Results show that both approaches are promising if Stone’s method is imposed, but the classical goodness-of-fit approach has a much higher detection rate (i.e., proportion of misfit items that are correctly detected) than the PPMC method.
  • 机译 得分
    摘要:
  • 机译 使用有关其相对重要性的幼稚专家判断对综合评分的加权成分
    • 作者:Peter Baldwin
    • 刊名:Applied Psychological Measurement
    • 2015年第7期
    摘要:A common problem that arises in testing—as well as other contexts such as candidate selection—is how to combine various scores into a weighted composite that reflects expert judgments about each component’s relative importance. For experts to provide nominal weights explicitly, they must fully account for the variances of the components, the covariances among components, and the reliability of each component. This task can be challenging, and in many cases, experts may have greater success making simple judgments about component importance without regard for the variances, covariances, and reliabilities. In this article, it is shown how to estimate the requisite nominal weights when only these kinds of naïve judgments are available, and the analytical solution is demonstrated with a small simulation study. Results from the simulation suggest that the proposed estimators could yield more valid composite scores in practice.
  • 机译 东南亚
    摘要:SEAsic (score equity assessment–summary index computation) is an R package for computing and graphing a variety of indices that quantify an important aspect of test fairness, that of reported score equity. Historically, test fairness has been statistically defined as a lack of differential predication and/or a presence of measurement invariance at the item level. More recent definitions of fairness include the concept of score equity, which calls for additional subpopulation analysis at the equated, reported test score level. SEAsic allows for efficient calculation and graphing of multiple score equity assessment (SEA) indices. All indices in Huggins and Penfield (2012) can be calculated and plotted in various ways given a user-provided conversion table. SEAsic is freely available on the Comprehensive R Archive Network (CRAN). Mac, Windows, and Linux users have access to the package via downloading the appropriate version of R or RStudio from the CRAN website. Multiple examples of each index computation, variations on each index, and plot options are provided in the package manual on CRAN.
  • 机译 IRT模型的上下渐近线对计算机自适应测试的影响
    摘要:In this article, the effect of the upper and lower asymptotes in item response theory models on computerized adaptive testing is shown analytically. This is done by deriving the step size between adjacent latent trait estimates under the four-parameter logistic model (4PLM) and two models it subsumes, the usual three-parameter logistic model (3PLM) and the 3PLM with upper asymptote (3PLMU). The authors show analytically that the large effect of the discrimination parameter on the step size holds true for the 4PLM and the two models it subsumes under both the maximum information method and the b-matching method for item selection. Furthermore, the lower asymptote helps reduce the positive bias of ability estimates associated with early guessing, and the upper asymptote helps reduce the negative bias induced by early slipping. Relative step size between modeling versus not modeling the upper or lower asymptote under the maximum Fisher information method (MI) and the b-matching method is also derived. It is also shown analytically why the gain from early guessing is smaller than the loss from early slipping when the lower asymptote is modeled, and vice versa when the upper asymptote is modeled. The benefit to loss ratio is quantified under both the MI and the b-matching method. Implications of the analytical results are discussed.
  • 机译 书评:项目响应理论建模手册:在典型绩效评估中的应用
    • 作者:Sandip Sinharay
    • 刊名:Applied Psychological Measurement
    • 2015年第6期
    摘要:
  • 机译 防御自动神经认知评估工具的一致性和可靠性评估
    摘要:A durable, portable, and field-hardened computerized neurocognitive test (CNT) called the Defense Automated Neurobehavioral Assessment (DANA) tool was recently developed to provide a practical means to conduct neurological and psychological assessment in situ. The psychometric properties of the DANA have been previously described. This present work discusses the test–retest reliability of the DANA Rapid test battery, as administered to a homogeneous population of U.S. Air Force Academy football team players (N = 162) across the duration of the season. The intraclass correlation coefficient (ICC) metric of the DANA is compared with that from two different CNTs recently reported in Cole et al., and the implications of using the metric to interpret comparative test reliability among different CNTs are discussed.
  • 机译 虚假的
    摘要:
  • 机译 书评:先进的方法,可支持总结性评估和形成性评估
    • 作者:Justin L. Kern
    • 刊名:Applied Psychological Measurement
    • 2015年第7期
    摘要:
  • 机译 通过可能性比较二参数和三参数物流模型比率测试
    摘要:Selection of an appropriate item response model is critical in the measurement of latent examinee ability. The one-, two-, and three-parameter logistic (1PL, 2PL, and 3PL) models are nested, and as such can be compared using likelihood ratio (LR) tests. The null hypothesis in the LR test for selection among the 2PL and 3PL models sets the guessing parameters to their lower bound of 0. This violates one of the assumptions of the LR test and renders the usual χ2 reference distribution inappropriate for the comparison. A review of the current literature revealed that this problem is not well understood in the educational measurement field. Ignoring this issue can lead to selection of an overly simplified model, with implications for the ability estimates. In this article, the use of the LR test for item response model selection is investigated, with the goal of providing practitioners with an appropriate method of selecting the most parsimonious model. The results of simulation studies indicate the nature of the problem, with inaccurate Type I error rates for cases where the inappropriate null distribution was used. An analysis of data from a statewide mathematics test showed differences pertinent to subsequent analyses.
  • 机译 具有协变量设计的非对等组下的核等式
    摘要:When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests.
  • 机译 一种用于测量的多级高阶项响应理论模型纵向数据的潜在增长
    • 作者:Hung-Yu Huang
    • 刊名:Applied Psychological Measurement
    • 2015年第5期
    摘要:In educational and psychological testing, individuals are often repeatedly measured to assess the changes in their abilities over time or their latent trait growth. If a test consists of several subtests, the latent traits may have a higher order structure, and traditional item response theory (IRT) models for longitudinal data are no longer applicable. In this study, various multilevel higher order item response theory (ML-HIRT) models for simultaneously measuring growth in the second- and first-order latent traits of dichotomous and polytomous items are proposed. A series of simulations conducted using the WinBUGS software with Markov chain Monte Carlo (MCMC) methods reveal that the parameters could be recovered satisfactorily and that latent trait estimation was reliable across measurement times. The application of the ML-HIRT model to longitudinal data sets is illustrated with two empirical examples.

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号