首页> 外文学位 >Prompt and rater effects in second language writing performance assessment.
【24h】

Prompt and rater effects in second language writing performance assessment.

机译:第二语言写作表现评估中的提示和评估者效果。

获取原文
获取原文并翻译 | 示例

摘要

Performance assessments have become the norm for evaluating language learners' writing abilities in international examinations of English proficiency. Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters. This raises the possibility of undue prompt and rater effects on test-takers' scores, which can affect the validity, reliability, and fairness of these tests.;This study uses data from the Michigan English Language Assessment Battery (MELAB), including all official ratings given over a period of over four years (n=29,831), to examine these issues related to scoring validity. It uses the multi-facet extension of Rasch methodology to model this data, producing measures on a common, interval scale. First, the study investigates the comparability of prompts that differ on topic domain, rhetorical task, prompt length, task constraint, expected grammatical person of response, and number of tasks. It also considers whether prompts are differentially difficult for test takers of different genders, language backgrounds, and proficiency levels. Second, the study investigates the quality of raters' ratings, whether these are affected by time and by raters' experience and language background. It also considers whether raters alter their rating behavior depending on their perceptions of prompt difficulty and of test-takers' prompt selection behavior.;The results show that test-takers' scores reflect actual ability in the construct being measured as operationalized in the rating scale, and are generally not affected by a range of prompt dimensions, rater variables, or test taker characteristics. It can be concluded that scores on this test and others whose particulars are like it have score validity, and assuming that other inferences in the validity argument are similarly warranted, can be used as a basis for making appropriate decisions. Further studies to develop a framework of task difficulty and a model of rater development are proposed.
机译:绩效评估已成为评估语言学习者在国际英语水平考试中的写作能力的规范。这些评估的两个方面通常在系统上有所不同:应试者对不同的提示做出回应,并且他们的回应被不同的评估者阅读。这增加了对应试者的分数产生不适当的提示和评估者影响的可能性,这可能会影响这些测试的有效性,可靠性和公平性。这项研究使用了密歇根州英语语言评估小组(MELAB)的数据,包括所有官方在四年(n = 29,831)内给出的评分,以研究与评分有效性相关的问题。它使用Rasch方法的多面扩展对该数据进行建模,从而以通用的间隔尺度生成度量。首先,研究调查了在主题领域,修辞任务,提示长度,任务约束,预期的响应语法人和任务数量方面不同的提示的可比性。它还考虑了对于不同性别,语言背景和熟练程度的考生,提示是否在难度上有所不同。其次,该研究调查了评估者的评估质量,这些评估是否受时间,评估者的经验和语言背景的影响。它还考虑了评估者是否根据对快速困难和应试者的及时选择行为的看法来改变其评估行为;结果表明,应试者的分数反映了在评估量表中可操作的被测结构的实际能力,并且通常不受一系列提示尺寸,评分者变量或应试者特征的影响。可以得出结论,该测试的得分以及其他类似得分的得分具有得分效度,并假设对有效性论证的其他推论也有类似的保证,可以用作做出适当决策的基础。提出了进一步研究以开发任务难度框架和评估者发展模型。

著录项

  • 作者

    Lim, Gad S.;

  • 作者单位

    University of Michigan.;

  • 授予单位 University of Michigan.;
  • 学科 Education Bilingual and Multicultural.;Language Rhetoric and Composition.;Education Tests and Measurements.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 220 p.
  • 总页数 220
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号