
ERRANT: Assessing and Improving Grammatical Error Type Classification




Grammatical Error Correction (GEC) is the task of correcting different types of errors in written texts. To manage this task, large amounts of annotated data that contain erroneous sentences are required. This data, however, is usually annotated according to each annotator's standards, making it difficult to manage multiple sets of data at the same time. The recently introduced Error Annotation Toolkit (ERRANT) tackled this problem by presenting a way to automatically annotate data that contain grammatical errors, while also providing a standardisation for annotation. errant extracts the errors and classifies them into error types, in the form of an edit that can be used in the creation of GEC systems, as well as for grammatical error analysis. However, we observe that certain errors are falsely or ambiguously classified. This could obstruct any qualitative or quantitative grammatical error type analysis, as the results would be inaccurate. In this work, we use a sample of the FCE coprus (Yannakoudakis et al., 2011) for secondary error type annotation and we show that up to 39% of the annotations of the most frequent type should be re-classified. Our corrections will be publicly released, so that they can serve as the starting point of a broader, collaborative, ongoing correction process.
机译:语法纠错(GEC)是在书面文本中纠正不同类型的错误的任务。要管理此任务,需要大量包含错误句子的注释数据。然而,此数据通常根据每个注释器的标准注释,这使得难以同时管理多组数据。最近引入的错误注释Toolkit(错误)通过呈现自动注释包含语法错误的数据来解决此问题,同时还提供注释的标准化。错误提取错误并将它们分类为错误类型,以可以在创建GEC系统中使用的编辑以及语法错误分析。但是,我们观察到某些错误被错误地或模棱两可分类。这可能会阻碍任何定性或定量的语法错误类型分析,因为结果是不准确的。在这项工作中,我们使用FCE Coprus的样本(Yannakoudakis等,2011)进行次要误差类型注释,我们表明最多39%的注释应该重新分类。我们的更正将公开发布,以便他们可以作为更广泛,协作,正在进行的更正过程的起点。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号