Validating Automated Essay Scoring: A (Modest) Refinement of the 'Gold Standard'

Powers Donald E.; Escoffery David S.; Duchnowski Matthew P.

首页> 外文期刊>Applied Measurement in Education >Validating Automated Essay Scoring: A (Modest) Refinement of the 'Gold Standard'

【24h】

Validating Automated Essay Scoring: A (Modest) Refinement of the 'Gold Standard'

机译：验证自动论文评分：“适度”完善“黄金标准”

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to data from a major testing program and one system of automated essay scoring. The refinement capitalizes on the fact that essay raters differ in numerous ways (e.g., training and experience), any of which may affect the quality of ratings. We found that automated scores exhibited different correlations with scores awarded by experienced raters (a more compelling criterion) than with those awarded by untrained raters (a less compelling criterion). The results suggest potential for a refined machine-human agreement model that differentiates raters with respect to experience, expertise, and possibly even more salient characteristics.

机译：到目前为止，验证（解释和使用）自动作文成绩的最常用方法是将其与人类评分者授予的分数进行比较。尽管这种做法值得怀疑，但是人机协议仍然经常被视为“黄金标准”。我们的目标是完善该模型，并将其应用于主要测试程序和自动作文评分系统中的数据。这种改进利用了散文评分者在许多方面（例如培训和经验）不同的事实，其中任何一种都可能影响评分的质量。我们发现，自动评分与有经验的评分员（较引人注目的标准）所授予的评分相比，与未经培训的评分员（较不引人注目的标准）所授予的评分具有不同的相关性。结果表明，有可能建立一种完善的机器人为协议模型，从而使评估者在经验，专业知识以及可能甚至更显着的特征方面与众不同。

著录项

来源
《Applied Measurement in Education》 |2015年第2期|共13页
作者
Powers Donald E.; Escoffery David S.; Duchnowski Matthew P.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard" [J] . Powers Donald E., Escoffery David S., Duchnowski Matthew P. Applied Measurement in Education . 2015,第2期

机译：验证自动论文评分：“适度”完善“黄金标准”
2. Validating human and automated scoring of essays against "True" scores [J] . Cohen Yoav, Levi Effi, Ben-Simon Anat Applied Measurement in Education . 2018,第3期

机译：验证散文的人类和自动评分对“真实”分数
3. The Gold Standard Paradox in Digital Image Analysis Manual Versus Automated Scoring as Ground Truth [J] . Aeffner Famke, Wilson Kristin, Martin Nathan T., Archives of pathology & laboratory medicine . 2017,第9期

机译：数字图像分析手册中的黄金标准悖论与地面真相自动得分
4. Essay Question Generator based on Bloom’s Taxonomy for Assessing Automated Essay Scoring System [C] . Jennifer O. Contreras, Shadi Hilles, Zainab Abu Bakar International Conference on Smart Computing and Electronic Enterprise . 2021

机译：基于绽放分类法评估自动论文评分系统的论文问题
5. Development and validation of an automated essay scoring engine to assess students' development across program levels [D] . Link, Stephanie Maranda 2015

机译：开发和验证自动作文评分引擎，以评估学生跨课程水平的发展
6. Sensitivity and specificity of neuropathy diabetes score neuropathy symptoms score diabetic neuropathy score and esthesiometry compared with the gold standards Michigan neuropathy screening instrument (MNSI) and Beck depression inventory (BDI) [O] . Lisiane Stefani Dias, Otto Henrique Nienov, Maria Cândida Ribeiro Parisi, 2015

机译：与金标准密歇根州神经病变筛查仪（MNSI）和贝克抑郁量表（BDI）相比神经病变糖尿病评分神经病变症状评分糖尿病神经病变评分和美学计量学的敏感性和特异性
7. In Validations We Trust? The Impact of Imperfect Human Annotations as a Gold Standard on the Quality of Validation of Automated Content Analysis [O] . Hyunjin Song, Petro Tolochko, Jakob-Moritz Eberl, 2020

机译：我们信任的验证？不完美人体注释的影响为自动化内容分析验证质量的黄金标准

Validating Automated Essay Scoring: A (Modest) Refinement of the 'Gold Standard'

摘要

著录项

相似文献

相关主题

期刊订阅