首页> 外文会议>Conference on empirical methods in natural language processing >Large-scale Cloze Test Dataset Created by Teachers
【24h】

Large-scale Cloze Test Dataset Created by Teachers

机译:教师创建的大规模完形填空测试数据集

获取原文

摘要

Cloze tests are widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH , containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck.
机译:完形填空考试在语言考试中被广泛采用,以评估学生的语言水平。在本文中,我们提出了第一个大规模的人为创建的完形填空测试数据集CLOTH,其中包含在初中和高中语言考试中使用的问题。由老师精心创建的缺失空白以及专门为细微差别设计的候选人选择,与以前自动生成的完形填空数据集相比,CLOTH需要更深入的语言理解和更广泛的关注范围。我们测试了专门设计的基准模型(包括在十亿字库上训练的语言模型)的性能,并显示人类的表现大大超过了他们。我们调查了性能差距的根源,跟踪了CLOTH某些独特属性的模型缺陷,并确定了理解长期情况的有限能力是关键瓶颈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号