A Simple Recipe for Multilingual Grammatical Error Correction

机译：多语言语法纠错的简单配方

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 1 1B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a CLANG-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy LANG-8 dataset. CLANG-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages - we demonstrate that performing a single fine-tuning step on CLANG-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.

机译：本文介绍了培训最先进的多语言语法纠错（GEC）模型的简单配方。我们通过首先提出一种语言无神不可知方法来实现这一目标，以产生大量的合成示例。第二种成分是使用大规模的多语言语言模型（最多1个1 1个参数）。一旦微调语言特定的监督集，我们将超越以前的四种语言的GEC基准的最先进的结果：英语，捷克语，德语和俄语。为GEC建立了一组新的基线，我们通过释放Clang-8数据集可以轻松可再现和可访问的结果。它是通过使用我们称之为GT5的最佳模型来制作，清理广泛使用且嘈杂的LANG-8数据集的目标。 Clang-8大大简化了由多种微调阶段组成的典型GEC训练管道 - 我们证明在具有现成语言模型中对Clang-8进行单一微调步骤产生了进一步的准确性改进，而是通过已经顶部执行的GT5英语模型。

著录项

来源
《Annual Meeting of the Association for Computational Linguistics;International Joint Conference on natural Language Processing 》|2021年|702-707|共6页
会议地点
作者
Sascha Rothe; Jonathan Mallinson; Eric Malmi; Sebastian Krause; Aliaksei Severyn;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Approach to NMT Re-Ranking Using Sequence-Labeling for Grammatical Error Correction [J] . Bo Wang, Kaoru Hirota, Chang Liu, Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2020 ,第4144期

机译：使用序列标记进行语法纠错的NMT重新排序方法
2. Grammatical and context-sensitive error correction using a statistical machine translation framework [J] . Nava Ehsan, Heshaam Faili Software . 2013 ,第2期

机译：使用统计机器翻译框架进行语法和上下文相关的错误纠正
3. Improvement of the LR parsing table and its application to grammatical error correction [J] . Shishibori M., Lee SS., Oono M., Information Sciences: An International Journal . 2002 ,第1a4期

机译：LR解析表的改进及其在语法错误纠正中的应用
4. Weaken Grammatical Error Influence in Chinese Grammatical Error Correction [C] . Jinggui Liang, Si Li International Conference on Natural Language Processing and Chinese Computing . 2020

机译：削弱语法误差校正中的语法误差影响
5. Age and knowledge of morphosyntax in English as an additional language: Grammatical judgment and error correction. [D] . Qureshi, Muhammad Asif. 2015

机译：英语作为另一种语言的年龄和句法知识：语法判断和错误纠正。
6. Correction: Protocol for the PINCER trial: a cluster randomised trial comparing the effectiveness of a pharmacist-led IT-based intervention with simple feedback in reducing rates of clinically important errors in medicines management in general practices [O] . Anthony J Avery, Sarah Rodgers, Judith A Cantrill, 2010

机译：更正：PINCER试验的方案：一项集群随机试验比较以药剂师为主导的基于IT的干预措施与简单反馈的效果以减少一般实践中药物管理中临床重要错误的发生率
7. The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction [O] . Phu Mon Htut, Joel Tetreault 2019

机译：为语法纠错产生人为误差的难以忍受的重量

A Simple Recipe for Multilingual Grammatical Error Correction

摘要

著录项

相似文献

相关主题

期刊订阅