首页> 外文学位 >Predicting differential item functioning in cross-lingual testing: The case of a high stakes test in the Kyrgyz Republic.
【24h】

Predicting differential item functioning in cross-lingual testing: The case of a high stakes test in the Kyrgyz Republic.

机译:预测跨语言测试中的差异项目功能:吉尔吉斯共和国的高风险测试案例。

获取原文
获取原文并翻译 | 示例

摘要

Cross-lingual tests are assessment instruments created in one language and adapted for use with another language group. Practitioners and researchers use cross-lingual tests for various descriptive, analytical and selection purposes both in comparative studies across nations and within countries marked by linguistic diversity (Hambleton, 2005). Due to cultural, contextual, psychological and linguistic differences between diverse populations, adapting test items for use across groups is a challenging endeavor. The validity of inferences based on cross-lingual tests can only be assured if the content, meaning, and difficulty of test items are similar in the different language versions of the test items (Ercikan, 2002).;Of paramount importance in the test adaptation process is the proven ability of test developers to adapt test items across groups in meaningful ways. One way investigators seek to understand the level of item equivalence on a cross-lingual assessment is to analyze items for differential item functioning, or DIF. DIF is present when examinees from different language groups do not have the same probability of responding correctly to a given item, after controlling for examinee ability (Camilli & Shephard, 1994). In order to detect and minimize DIF, test developers employ both statistical methods and substantive (judgmental) reviews of cross-lingual items. In the Kyrgyz Republic, item developers rely on substantive review of items by bi-lingual professionals. In situations where statistical DIF detection methods are not typically utilized, the accuracy of such professionals in discerning differences in content, meaning and difficulty between items is especially important.;In this study, the accuracy of bi-linguals' predictions about whether differences between Kyrgyz and Russian language test items would lead to DIF was evaluated. The items came from a cross-lingual university scholarship test in the Kyrgyz Republic. Evaluators' predictions were compared to a statistical test of "no difference" in response patterns by group using the logistic regression (LR) DIF detection method (Swaminathan & Rogers, 1990). A small number of test items were estimated to have "practical statistical DIF." There was a modest, positive correlation between evaluators' predictions and statistical DIF levels. However, with the exception of one item type, sentence completion, evaluators were unable to predict which language group was favored by differences on a consistent basis. Plausible explanations for this finding as well as ways to improve the accuracy of substantive review are offered.;Data was also collected to determine the primary sources of DIF in order to inform the test development and adaptation process in the republic. Most of the causes of DIF were attributed to highly contextual (within item) sources of difference related to overt adaptation problems. However, inherent language differences were also noted: Syntax issues with the sentence completion items made the adaptation of this item type from Russian into Kyrgyz problematic. Statistical and substantive data indicated that the reading comprehension items were less problematic to adapt than analogy and sentence completion items. I analyze these findings and interpret their implications to key stakeholders, provide recommendations for how to improve the process of adapting items from Russian into Kyrgyz and highlight cautions to interpreting the data collected in this study.
机译:跨语言测试是一种用一种语言创建并适合与另一种语言一起使用的评估工具。从业人员和研究人员在跨国比较研究中以及在具有语言多样性特征的国家内部,都使用跨语言测试来进行各种描述性,分析性和选择性目的(Hambleton,2005年)。由于不同人群之间在文化,语境,心理和语言上的差异,使测试项目适合不同人群使用是一项具有挑战性的工作。仅当在不同语言版本的测试项目中测试项目的内容,含义和难度相似时,才能确保基于跨语言测试的推理的有效性(Ercikan,2002)。过程是测试开发人员以有意义的方式跨组调整测试项目的公认能力。调查人员试图了解跨语言评估中项目等效性的一种方法是分析项目以区分项目功能或DIF。在控制了应试者的能力之后,如果来自不同语言组的应试者没有相同的概率正确回答给定项目,则存在DIF(Camilli&Shephard,1994)。为了检测和最小化DIF,测试开发人员同时使用统计方法和跨语言项目的实质性(判断性)审查。在吉尔吉斯共和国,项目开发人员依靠双语专业人员对项目进行实质性审查。在通常不使用统计DIF检测方法的情况下,此类专业人员分辨项目之间的内容,含义和难度差异的准确性尤为重要。并且俄语测试项目会导致DIF被评估。这些项目来自吉尔吉斯共和国的跨语言大学奖学金考试。使用逻辑回归(LR)DIF检测方法,将评估者的预测与响应模式的“无差异”统计测试进行了比较(Swaminathan&Rogers,1990)。少数测试项目估计具有“实用统计DIF”。评估者的预测与统计DIF水平之间存在适度的正相关。但是,除了一个项目类型(句子完成)外,评估者无法一致地预测哪个语言组会受到差异的青睐。提供了对此发现的合理解释,以及提高实质审查的准确性的方法。;还收集了数据以确定DIF的主要来源,以便为共和国的测试开发和适应过程提供信息。 DIF的大多数原因都归因于与公开适应问题相关的高度上下文差异(项目内)。但是,还注意到了固有的语言差异:句子补全项目的语法问题使该项目类型从俄语适应吉尔吉斯语成为问题。统计和实质性数据表明,与类推和句子完成项目相比,阅读理​​解项目的适应问题较少。我分析了这些发现并解释了它们对关键利益相关者的影响,为如何改进从俄文到吉尔吉斯斯坦的项目改编提供了建议,并强调了解释本研究中收集的数据时应注意的注意事项。

著录项

  • 作者

    Drummond, Todd W.;

  • 作者单位

    Michigan State University.;

  • 授予单位 Michigan State University.;
  • 学科 Education Tests and Measurements.;Education Policy.;Slavic Studies.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 314 p.
  • 总页数 314
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号