首页> 外文OA文献 >How to even the score: an investigation into how native and Arab non-native teachers of English rate essays containing short and long sentences.
【2h】

How to even the score: an investigation into how native and Arab non-native teachers of English rate essays containing short and long sentences.

机译:分数如何平均:调查英语和英语非英语教师如何评价短句和长句。

摘要

In the field of education, test scores are meant to provide an indication of test-takers’ knowledge or abilities. The validity of tests must be rigorously investigated to ensure that the scores obtained are meaningful and fair. Owing to the subjective nature of the scoring process, rater variation is a major threat to the validity of performance-based language testing (i.e., speaking and writing). This investigation explores the influence of two main effects on writing test scores using an analytic rating scale. The first main effect is that of raters’ first language (native and non-native). The second is the average length of sentences (essays with short sentences and essays with long sentences). The interaction between the main effects will also be analyzed. Sixty teachers of English as a second or foreign language (30 natives and 30 non-natives) working in Kuwait, used a 9-point analytic rating scale with four criteria to rate 24 essays with contrasting average sentence length (12 essays with short sentences on average and 12 with long sentences). Multi-Facet Rasch Measurement (using FACETS program, version 3.71.4) showed that: (1) the overall scores awarded by raters differed significantly in severity; (2) there were a number of significant bias interactions between raters’ first language and the essays' average sentence length; (3) the native raters generally overestimated the essays with short sentences by awarding higher scores than expected, and underestimated the essays with long sentences by awarding lower scores than expected. The non-natives displayed the reverse pattern. This was shown on all four criteria of the analytic rating scale. Furthermore, there was a significant interaction between raters and criteria, especially the criterion 'Grammatical range and accuracy'. Two sets of interviews were subsequently carried out. The first set had many limitations and its findings were not deemed adequate. The second set of interviews showed that raters were not influenced by sentence length per se, but awarded scores that were higher/lower than expected mainly due to the content and ideas, paragraphing, and vocabulary. This focus is most likely a result of the very problematic writing assessment scoring rubric of the Ministry of Education-Kuwait. The limitations and implications of this investigation are then discussed. udud
机译:在教育领域,考试分数旨在指示应试者的知识或能力。必须严格检查测试的有效性,以确保获得的分数有意义且公平。由于评分过程的主观性质,评分者差异是对基于性能的语言测试(即口语和写作)有效性的主要威胁。这项研究使用分析等级量表探索了两个主要影响因素对写作考试成绩的影响。第一个主要影响是评估者的母语(母语和非母语)。第二个是句子的平均长度(短句子的文章和长句子的文章)。主要效果之间的相互作用也将进行分析。在科威特工作的60名英语作为第二语言或外语的教师(30名当地人和30名非本地人)使用9点分析评分量表和四个标准对24篇平均句子长度不同的论文进行评分(12篇简短句子的论文)。平均和12个长句)。多重-Facet Rasch测量(使用FACETS程序,版本3.71.4)显示:(1)评分者授予的总分在严重性上存在显着差异; (2)评分者的母语和论文的平均句子长度之间存在许多显着的偏见互动; (3)本地评分者通常通过给予比预期更高的分数来高估短句的论文,而通过给予低于预期的分数来低估带有长句的论文。非本地人显示相反的模式。这在分析等级量表的所有四个标准上都得到了证明。此外,评估者与标准之间存在显着的相互作用,尤其是“语法范围和准确性”标准。随后进行了两组访谈。第一组有很多局限性,其发现不充分。第二组访谈显示,评分者本身不受句子长度的影响,但授予的分数高于/低于预期的分数主要是由于内容和想法,段落和词汇的原因。这种关注很可能是由于科威特教育部的写作评估评分标准存在很大问题。然后讨论了这项研究的局限性和含义。 ud ud

著录项

  • 作者

    Ameer Saleh;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号