Using PRMSE to evaluate automated scoring systems in the presence of label noise

机译：使用PRMSE在存在标签噪声时评估自动评分系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The effect of noisy labels on the performance of NLP systems has been studied extensively for system training. In this paper, we focus on the effect that noisy labels have on system evaluation. Using automated scoring as an example, we demonstrate that the quality of human ratings used for system evaluation have a substantial impact on traditional performance metrics, making it impossible to compare system evaluations on labels with different quality. We propose that a new metric, proportional reduction in mean squared error (PRMSE), developed within the educational measurement community, can help address this issue, and provide practical guidelines on using PRMSE,

机译：对系统培训进行了广泛研究了噪声标签对NLP系统性能的影响。在本文中，我们专注于嘈杂标签对系统评估的影响。使用自动评分作为示例，我们证明了用于系统评估的人类评级质量对传统绩效指标具有大量影响，从而无法对具有不同质量的标签进行比较系统评估。我们建议在教育测量社区中开发的新的公制，比例减少（PRMSE），可以帮助解决这个问题，并提供使用PRMSE的实用指南，

著录项

来源
《Workshop on Innovative use of NLP for Building Educational Applications》|2020年|18-29|共12页
会议地点
作者
Anastassia Loukina; Nitin Madnani; Aoife Cahill; Lili Yao; Matthew S. Johnson; Brian Riordan; Daniel F. McCaffrey;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions [J] . Rupp Andre A. Applied Measurement in Education . 2018,第3期

机译：用有效性设计，评估和部署自动化评分系统：方法论设计决策
2. Optical Wireless Communication Systems Operation Performance Efficiency Evaluation in the Presence of Different Fog Density Levels and Noise Impact [J] . Rashed Ahmed Nabih Zaki Wireless personal communications: An Internaional Journal . 2015,第1期

机译：不同雾密度水平和噪声影响下的光无线通信系统运行性能效率评估
3. Analytical performance evaluation of the OFDM systems in the presence of jointly fifth order nonlinearity and phase noise [J] . M. H. Madani, A. Abdipour, A. Mohammadi Analog Integrated Circuits and Signal Processing . 2011,第1期

机译：联合存在五阶非线性和相位噪声的OFDM系统的分析性能评估
4. Overview of speech quality metrics in terms of automated evaluation of signal denoising in a presence of non-stationary noise [C] . Karol J. Duzinkiewicz, Damian Koszewski, Kamila Pietrusinska, Audio Engineering Society Convention . 2020

机译：在存在非静止噪声的情况下的信号去噪自动评估方面概述
5. An evaluation of automated scoring programs designed to score essays [D] . Khaliq, Shameem Nyla 2004

机译：对旨在评分文章的自动评分程序的评估
6. System evaluation of automated production and inhalation of 15O-labeled gaseous radiopharmaceuticals for the rapid 15O-oxygen PET examinations [O] . Satoshi Iguchi, Tetsuaki Moriguchi, Makoto Yamazaki, 2018

机译：用于15O-氧气PET快速检查的15O标记的气态放射性药物的自动生产和吸入的系统评估
7. Unveiling the Scoring Validity of Two Chinese Automated Writing Evaluation Systems: A Quantitative Study [O] . Jian Wang, Lifang Bai 2021

机译：揭示两种中国自动化写作评估系统的评分有效性：定量研究

Using PRMSE to evaluate automated scoring systems in the presence of label noise

摘要

著录项

相似文献

相关主题

期刊订阅