首页> 外文期刊>Journal of Science Education and Technology >A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements
【24h】

A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

机译:基于机器学习的科学评估的META分析:影响机器人体评分协议的因素

获取原文
获取原文并翻译 | 示例
           

摘要

Machine learning (ML) has been increasingly employed in science assessment to facilitate automatic scoring efforts, although with varying degrees of success (i.e., magnitudes of machine-human score agreements [MHAs]). Little work has empirically examined the factors that impact MHA disparities in this growing field, thus constraining the improvement of machine scoring capacity and its wide applications in science education. We performed a meta-analysis of 110 studies of MHAs in order to identify the factors most strongly contributing to scoring success (i.e., high Cohen's kappa [kappa]). We empirically examined six factors proposed as contributors to MHA magnitudes: algorithm, subject domain, assessment format, construct, school level, and machine supervision type. Our analyses of 110 MHAs revealed substantial heterogeneity in kappa(mean=.64; range = .09-.97, taking weights into consideration). Using three-level random-effects modeling, MHA score heterogeneity was explained by the variability both within publications (i.e., the assessment task level: 82.6%) and between publications (i.e., the individual study level: 16.7%). Our results also suggest that all six factors have significant moderator effects on scoring success magnitudes. Among these, algorithm and subject domain had significantly larger effects than the other factors, suggesting that technical features and assessment external features might be primary targets for improving MHAs and ML-based science assessments.
机译:机器学习(ML)越来越多地用于科学评估,以便于自动评分努力,尽管有不同程度的成功(即机器人体评分协议[MHAS])。一点工作已经经验检查了影响这种成长领域的MHA差异的因素,从而限制了机器评分能力的提高及其在科学教育中的广泛应用。我们对MHA的110项研究进行了META分析,以确定最强烈促进成功的因素(即,High Cohen的Kappa [Kappa])。我们经验审查了六个因素,提出了MHA级数的贡献者:算法,主题领域,评估格式,构建,学校等级和机器监督类型。我们的110 MHA分析显示Kappa(平均值= .64;范围= .09-.97,考虑重量)。使用三级随机效应建模,通过出版物(即评估任务水平:82.6%)和出版物之间的可变性来解释MHA评分异质性,并在出版物之间(即个别研究水平:16.7%)。我们的研究结果还表明,所有六个因素都对得分成功幅度有显着的主持人效应。其中,算法和主题域具有比其他因素更大的效果,这表明技术特征和评估外部特征可能是改善MHA和基于ML的科学评估的主要目标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号