...
首页> 外文期刊>Large-scale Assessments in Education >Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments
【24h】

Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments

机译:在大规模评估中使用广义Logistic回归检测差异项功能

获取原文
           

摘要

Abstract Background When studying student performance across different countries or cultures, an important aspect for comparisons is that of score comparability. In other words, it is imperative that the latent variable (i.e., construct of interest) is understood and measured equivalently across all participating groups or countries, if our inferences regarding performance can be regarded as valid. Relatively fewer studies examined an item-level approach to measurement equivalence, particularly in settings where a large number of groups is included. Methods This simulation study examines item-level differential item functioning (DIF) in the context of international large-scale assessment (ILSA) using a generalized logistic regression approach. Manipulated factors included the number of groups (10 or 20), magnitude of DIF, percent of DIF items, the nature of DIF, as well as the percent of affected groups with DIF. Results Results suggested that the number of groups did not have an effect of the performance of the method (high power and low Type I error rates); however, other factors had impacted the accuracy. Specifically, Type I error rates were inflated in non-DIF conditions, while they were very conservative in all of the DIF conditions. Power was generally high, in particular in conditions where DIF magnitude was large, with one exception – in conditions where DIF was introduced in difficulty parameters and the percent of DIF items was 60. Conclusions Our findings presented a mixed picture with respect to the performance of the generalized logistic regression method in the context of large number of groups with large sample sizes. In the presence of DIF, the method was successful in distinguishing between DIF and non-DIF, as evidenced by low Type I error and high power rates. On the other hand, however, in the absence of DIF, the method yielded increased Type I errors.
机译:摘要背景在研究不同国家或文化中学生的表现时,比较的一个重要方面是分数的可比性。换句话说,如果我们关于绩效的推论可以被认为是有效的,那么就必须在所有参与团体或国家中等效地理解和衡量潜在变量(即感兴趣的结构)。相对较少的研究检查了项目级的测量等效性方法,特别是在包含大量组的环境中。方法该模拟研究使用广义逻辑回归方法在国际大规模评估(ILSA)的背景下检验了项目级别的差异项目功能(DIF)。受控因素包括组数(10或20),DIF的大小,DIF项目的百分比,DIF的性质以及患有DIF的受影响组的百分比。结果结果表明,组数对方法的性能没有影响(高功效和低I型错误率)。但是,其他因素也影响了准确性。具体来说,I型错误率在非DIF条件下被夸大,而在所有DIF条件下都非常保守。功率通常很高,特别是在DIF幅值很大的情况下,但有一个例外-在DIF引入难度参数且DIF项的百分比为60的情况下。结论我们的发现对DIF的性能表现出不同的看法。大量样本量较大的群体中的广义逻辑回归方法。在存在DIF的情况下,该方法成功地区分了DIF和非DIF,这一点可通过低I型误差和高功率率来证明。但是,另一方面,在没有DIF的情况下,该方法产生的I类错误增加了。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号