您现在的位置:首页>美国卫生研究院文献>Applied Psychological Measurement

期刊信息

  • 期刊名称:

    -

  • 刊频: Eight no. a year, 2008-
  • NLM标题:
  • iso缩写: -
  • ISSN: -

年度选择

更多>>

  • 排序:
  • 显示:
  • 每页:
全选(0
<4/11>
209条结果
  • 机译 关于(删节的)移位沃尔德和维纳之间的关系分布作为选择响应时间的度量模型
    摘要:Inferring processes or constructs from performance data is a major hallmark of cognitive psychometrics. Particularly, diffusion modeling of response times (RTs) from correct and erroneous responses using the Wiener distribution has become a popular measurement tool because it provides a set of psychologically interpretable parameters. However, an important precondition to identify all of these parameters is a sufficient number of RTs from erroneous responses. In the present article, we show by simulation that the parameters of the Wiener distribution can be recovered from tasks yielding very high or even perfect response accuracies using the shifted Wald distribution. Specifically, we argue that error RTs can be modeled as correct RTs that have undergone censoring by using techniques from parametric survival analysis. We illustrate our reasoning by fitting the Wiener and (censored) shifted Wald distribution to RTs from six participants who completed a Go/No-go task. In accordance with our simulations, diffusion modeling using the Wiener and the shifted Wald distribution yielded identical parameter estimates when the number of erroneous responses was predicted to be low. Moreover, the modeling of error RTs as censored correct RTs substantially improved the recovery of these diffusion parameters when prematuretrial timeout was introduced to increase the number of omission errors. Thus,the censored shifted Wald distribution provides a suitable means for diffusionmodeling in situations when the Wiener distribution cannot be fitted withoutparametric constraints.
  • 机译 忽略混合物项响应模型中的多层结构:对参数恢复和模型选择的影响
    摘要:The current study investigated the consequences of ignoring a multilevel structure for a mixture item response model to show when a multilevel mixture item response model is needed. Study 1 focused on examining the consequence of ignoring dependency for within-level latent classes. Simulation conditions that may affect model selection and parameter recovery in the context of a multilevel data structure were manipulated: class-specific ICC, cluster size, and number of clusters. The accuracy of model selection (based on information criteria) and quality of parameter recovery were used to evaluate the impact of ignoring a multilevel structure. Simulation results indicated that, for the range of class-specific ICCs examined here (.1 to .3), mixture item response models which ignored a higher level nesting structure resulted in less accurate estimates and standard errors (SEs) of item discrimination parameters when the number of clusters was larger than 24 and the cluster size was larger than six. Class-varying ICCs can have compensatory effects on bias. Also, the results suggested that a mixture item response model which ignored multilevel structure was not selected over the multilevel mixture item response model based on Bayesian information criterion (BIC) if the number of clusters and cluster sizewas at least 50, respectively. In Study 2, the consequences of unnecessarilyfitting a multilevel mixture item response model to single-level data wereexamined. Reassuringly, in the context of single-level data, a multilevelmixture item response model was not selected by BIC, and its use would notdistort the within-level item parameter estimates or SEs whenthe cluster size was at least 20. Based on these findings, it is concluded that,for class-specific ICC conditions examined here, a multilevel mixture itemresponse model is recommended over a single-level item response model for aclustered dataset having cluster size  > 20 and the number of clusters  > 50.
  • 机译 内核等同性和测试特性的比较评估曲线等值
    摘要:This study compares the kernel equating (KE) and test characteristic curve (TCC) equating methods using the nonequivalent anchor test equating design. In this Monte Carlo study, four independent variables were examined: sample size, test length, average form discrimination, anchor test reliability, and the percentage of anchor items. For each condition, there were 100 replications. To assess the performance of TCC equating and KE, the differences between the examinee parametric true scores and the equated estimated expected true scores were examined. The equated scores were based on the average across replications for each condition. Generally speaking, both KE and TCC equating produced accurate results, although KE tended to perform better than TCC on the parametric true score scale across conditions. Past research and the current study’s results seem to indicate that KE should be strongly considered for most equating situations, particularly in light of its flexibility.
  • 机译 关于项目响应模型的贝叶斯信息准则中的N的注记
    摘要:This brief report derives the N in the penalty term of the Schwarz’s (1978) Bayesian information criterion (BIC) for two-parameter logistic item response models. The results in this study show that the N is the number of persons for fixed item models, whereas it is the number of observations (the Number of Persons times the Number of Items) for random item models. Given these results, the authors recommend researchers to calculate the BIC or to validate the BIC value that shows in the output of software instead of accepting the output value without a further check of implicit assumptions made for the software.
  • 机译 使用SPIRIT宏的SPSS中的IRT
    摘要:
  • 机译 心理测验有助于学习:从评估到学习
    摘要:
  • 机译 一种用于运动轨迹认知技能的隐马尔可夫模型及其在空间旋转技巧中的应用
    摘要:The increasing presence of electronic and online learning resources presents challenges and opportunities for psychometric techniques that can assist in the measurement of abilities and even hasten their mastery. Cognitive diagnosis models (CDMs) are ideal for tracking many fine-grained skills that comprise a domain, and can assist in carefully navigating through the training and assessment of these skills in e-learning applications. A class of CDMs for modeling changes in attributes is proposed, which is referred to as learning trajectories. The authors focus on the development of Bayesian procedures for estimating parameters of a first-order hidden Markov model. An application of the developed model to a spatial rotation experimental intervention is presented.
  • 机译 自适应学习推荐系统
    摘要:An adaptive learning system aims at providing instruction tailored to the current status of a learner, differing from the traditional classroom experience. The latest advances in technology make adaptive learning possible, which has the potential to provide students with high-quality learning benefit at a low cost. A key component of an adaptive learning system is a recommendation system, which recommends the next material (video lectures, practices, and so on, on different skills) to the learner, based on the psychometric assessment results and possibly other individual characteristics. An important question then follows: How should recommendations be made? To answer this question, a mathematical framework is proposed that characterizes the recommendation process as a Markov decision problem, for which decisions are made based on the current knowledge of the learner and that of the learning materials. In particular, two plain vanilla systems are introduced, for which the optimal recommendation at each stage can be obtained analytically.
  • 机译 使用自动项目生成为计算机化的形成性测试创建解决方案和原理
    摘要:Computerized testing provides many benefits to support formative assessment. However, the advent of computerized formative testing has also raised formidable new challenges, particularly in the area of item development. Large numbers of diverse, high-quality test items are required because items are continuously administered to students. Hence, hundreds of items are needed to develop the banks necessary for computerized formative testing. One promising approach that may be used to address this test development challenge is automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modeling practices are used to produce items with the aid of computer technology. The purpose of this study is to describe a new method for generating both the items and the rationales required to solve the items to produce the required feedback for computerized formative testing. The method for rationale generation is demonstrated and evaluated in the medical education domain.
  • 机译 支持向量机在按属性分类中的应用
    摘要:Cognitive diagnostic modeling in educational measurement has attracted much attention from researchers in recent years. Its applications in real-world assessments, however, have been lagging behind its theoretical development. Reasons include but are not limited to requirement of large sample size, computational complexity, and lack of model fit. In this article, the authors propose to use the support vector machine (SVM), a popular supervised learning method to make classification decisions on each attribute (i.e., if the student masters the attribute or not), given a training dataset. By using the SVM, the problem of fitting and calibrating a cognitive diagnostic model (CDM) is converted into a quadratic optimization problem in hyperdimensional space. A classification boundary is obtained from the training dataset and applied to new test takers. The present simulation study considers the training sample size, the error rate in the training sample, the underlying CDM, as well as the structural parameters in the underlying CDM. Results indicate that by using the SVM, classification accuracy rates are comparable with those obtained from previous studies at both the attribute and pattern levels with much smaller sample sizes. The method is also computationally efficient. It therefore has great promise to increase the usability of cognitive diagnostic modeling in educational assessments, particularly small-scale testing programs.
  • 机译 用于测量正确响应和错误类型变化的多级纵向嵌套Logit模型
    摘要:This article presents a multilevel longitudinal nested logit model for analyzing correct response and error types in multilevel longitudinal intervention data collected under a pretest–posttest, cluster randomized trial design. The use of the model is illustrated with a real data analysis, including a model comparison study regarding model complexity and cluster bias. Two substantive research questions regarding the intervention effect on correct response probability and error patterns are investigated using the proposed model. The recovery of item parameters for the proposed model using two sample size conditions is examined via a simulation study. The accuracy of the parameter estimates is comparable with those found in previous studies for the same family of models, except for the intercept parameters of correct responses. Finally, the impact of ignoring cluster membership in the model on the parameter estimation is also studied by fitting a single-level model to multilevel data. Ignoring cluster membership in the model adversely affects the estimation of intercept parameters in correct and error responses.
  • 机译 学习诊断系统在汉语课堂中的应用
    摘要:
  • 机译 通过光谱图进行探索性项目分类聚类
    摘要:Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire.
  • 机译 多维成对比较项的引语测试的项响应理论模型
    摘要:There is re-emerging interest in adopting forced-choice items to address the issue of response bias in Likert-type items for noncognitive latent traits. Multidimensional pairwise comparison (MPC) items are commonly used forced-choice items. However, few studies have been aimed at developing item response theory models for MPC items owing to the challenges associated with ipsativity. Acknowledging that the absolute scales of latent traits are not identifiable in ipsative tests, this study developed a Rasch ipsative model for MPC items that has desirable measurement properties, yields a single utility value for each statement, and allows for comparing psychological differentiation between and within individuals. The simulation results showed a good parameter recovery for the new model with existing computer programs. This article provides an empirical example of an ipsative test on work style and behaviors.
  • 机译 推断项目适合度评估在认知诊断中的作用造型
    摘要:Research related to the fit evaluation at the item level involving cognitive diagnosis models (CDMs) has been scarce. According to the parsimony principle, balancing goodness of fit against model complexity is necessary. General CDMs require a larger sample size to be estimated reliably, and can lead to worse attribute classification accuracy than the appropriate reduced models when the sample size is small and the item quality is poor, which is typically the case in many empirical applications. The main purpose of this study was to systematically examine the statistical properties of four inferential item-fit statistics: S − X2, the likelihood ratio (LR) test, the Wald (W) test, and the Lagrange multiplier (LM) test. To evaluate the performance of the statistics, a comprehensive set of factors, namely, sample size, correlational structure, test length, item quality, and generating model, is systematically manipulated using Monte Carlo methods. Results show that the S − X2 statistic has unacceptable power. Type I error and power comparisons favor LR and W tests over the LM test. However, all the statistics are highly affected by the item quality. With a few exceptions, their performance is only acceptable when the item quality is high. In some cases, this effect can be ameliorated by an increase in sample size and test length.This implies that using the above statistics to assess item fit in practicalsettings when the item quality is low remains a challenge.
  • 机译 研究M2和M2的行为RMSEA2在将一维模型拟合到多维中数据
    摘要:It has been widely known that the Type I error rates of goodness-of-fit tests using full information test statistics, such as Pearson’s test statistic χ2 and the likelihood ratio test statistic G2, are problematic when data are sparse. Under such conditions, the limited information goodness-of-fit test statistic M2 is recommended in model fit assessment for models with binary response data. A simulation study was conducted to investigate the power and Type I error rate of M2 in fitting unidimensional models to many different types of multidimensional data. As an additional interest, the behavior of RMSEA2 was also examined, which is the root mean square error approximation (RMSEA) based on M2. Findings from the current study showed that M2 and RMSEA2 are sensitive in detecting the misfits due to varying slope parameters, the bifactor structure, and the partially (or completely) simple structure for multidimensional data, but not the misfits due to the within-item multidimensional structures.
  • 机译 是一种比固定项目更能激发人们兴趣的计算机化自适应测试测试?
    摘要:Computer adaptive tests provide important measurement advantages over traditional fixed-item tests, but research on the psychological reactions of test takers to adaptive tests is lacking. In particular, it has been suggested that test-taker engagement, and possibly test performance as a consequence, could benefit from the control that adaptive tests have on the number of test items examinees answer correctly. However, previous research on this issue found little support for this possibility. This study expands on previous research by examining this issue in the context of a mathematical ability assessment and by considering the possible effect of immediate feedback of response correctness on test engagement, test anxiety, time on task, and test performance. Middle school students completed a mathematics assessment under one of three test type conditions (fixed, adaptive, or easier adaptive) and either with or without immediate feedback about the correctness of responses. Results showed little evidence for test type effects. The easier adaptive test resulted in higher engagement and lower anxiety than either the adaptive or fixed-item tests; however, no significant differences in performance were found across test types, although performance was significantly higher across all test types whenstudents received immediate feedback. In addition, these effects were notrelated to ability level, as measured by the state assessment achievementlevels. The possibility that test experiences in adaptive tests may not inpractice be significantly different than in fixed-item tests is raised anddiscussed to explain the results of this and previous studies.
  • 机译 基于熵的人员适合项目反应的测度理论
    摘要:This article introduces three new variants of entropy to detect person misfit (Ei, EMi, and EMRi), and provides preliminary evidence that these measures are worthy of further investigation. Previously, entropy has been used as a measure of approximate data–model fit to quantify how well individuals are classified into latent classes, and to quantify the quality of classification and separation between groups in logistic regression models. In the current study, entropy is explored through conceptual examples and Monte Carlo simulation comparing entropy with established measures of person fit in item response theory (IRT) such as lz, lz*, U, and W. Simulation results indicated that EMi and EMRi were successfully able to detect aberrant response patterns when comparing contaminated and uncontaminated subgroups of persons. In addition, EMi and EMRi performed similarly in showing separation between the contaminated and uncontaminated subgroups. However, EMRi may beadvantageous over other measures when subtests include a small number of items.EMi and EMRi arerecommended for use as approximate person-fit measures for IRT models. Thesemeasures of approximate person fit may be useful in making relative judgmentsabout potential persons whose response patterns do not fit the theoreticalmodel.
  • 机译 多维项目响应理论模型中的参数恢复在复杂性和非常态下
    摘要:Information about the psychometric properties of items can be highly useful in assessment development, for example, in item response theory (IRT) applications and computerized adaptive testing. Although literature on parameter recovery in unidimensional IRT abounds, less is known about parameter recovery in multidimensional IRT (MIRT), notably when tests exhibit complex structures or when latent traits are nonnormal. The current simulation study focuses on investigation of the effects of complex item structures and the shape of examinees’ latent trait distributions on item parameter recovery in compensatory MIRT models for dichotomous items. Outcome variables included bias and root mean square error. Results indicated that when latent traits were skewed, item parameter recovery was generally adversely impacted. In addition, the presence of complexity contributed to decreases in the precision of parameter recovery, particularly for discrimination parameters along one dimension when at least one latent trait was generated as skewed.
  • 机译 能力与先验分配不匹配:公共项目链接方法
    • 作者:Brandon LeBeau
    • 刊名:Applied Psychological Measurement
    • 2017年第7期
    摘要:Linking of two forms is an important task when using item response theory, particularly when two forms are administered to nonequivalent groups. When linking with characteristic curve methods, the ability distribution and weights associated with that distribution can be used to weight observations differently. These are commonly specified as equally spaced intervals from −4 to 4, but other options or distributional forms can be specified. The use of these different distributions and weights of the ability distributions will be explored with a Monte Carlo simulation. Primary simulation conditions will include sample size, number of items, number of common items, ability distribution, and randomly varying population transformation constants. Study results show that the linking weights have little impact on the estimation of the linking constants; however, the underlying ability distribution of examinees does have significant impact. Implications for applied researchers will be discussed.

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号