Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data

机译：对合成数据进行无监督预训练的神经语法错误校正系统

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Considerable effort has been made to address the data sparsity problem in neural grammatical error correction. In this work, we propose a simple and surprisingly effective unsupervised synthetic error generation method based on confusion sets extracted from a spellchecker to increase the amount of training data. Synthetic data is used to pre-train a Transformer sequence-to-sequence model, which not only improves over a strong baseline trained on authentic error-annotated data, but also enables the development of a practical GEC system in a scenario where little genuine error-annotated data is available. The developed systems placed first in the BEA 19 shared task, achieving 69.47 and 64.24 F_(0.5) in the restricted and low-resource tracks respectively, both on the W&I+LOCNESS test set. On the popular CoNLL 2014 test set, we report state-of-the-art results of 64.16 M~2 for the submitted system, and 61.30 M~2 for the constrained system trained on the NUCLE and Lang-8 data.

机译：为了解决神经语法错误校正中的数据稀疏性问题，已经做出了相当大的努力。在这项工作中，我们提出了一种简单有效的无监督综合错误生成方法，该方法基于从拼写检查器中提取的混淆集来增加训练数据量。合成数据用于预训练Transformer序列到序列模型，该模型不仅可以提高对基于真实错误注释数据进行训练的强大基线，而且还可以在实际误差很小的情况下开发实用的GEC系统注释的数据可用。在W＆I + LOCNESS测试集上，已开发的系统在BEA 19共享任务中排在首位，分别在受限和资源匮乏的轨道上达到69.47 F_（0.5）。在流行的CoNLL 2014测试集上，我们报告了最新系统的结果，即提交的系统为64.16 M〜2，受约束的系统的最新结果为61.30 M〜2，这些系统在NUCLE和Lang-8数据上进行了训练。

著录项

来源
《Workshop on innovative use of NLP for building educational applications;Annual meeting of the Association for Computational Linguistics》|2019年|252-263|共12页
会议地点 Florence(IT)
作者
Roman Grundkiewicz; Marcin Junczys-Dowmunt; Kenneth Heafield;
展开▼
作者单位

University of Edinburgh Scotland EU;

Microsoft Redmond USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. MEASUREMENTS IN INFORMATION TECHNOLOGIES: A METHOD OF ERROR-CORRECTION CODING FOR DATA TRANSMISSION CHANNELS IN TELEMETRY INFORMATION SYSTEMS WITH DOUBLE BYTE DATA ERROR CORRECTION [J] . A. A Pavlov, A. N. Tsarkov, P. A. Pavlov, Measurement techniques . 2014,第7期

机译：信息技术中的测量：一种具有双字节数据错误校正的遥信系统中数据传输通道的错误校正编码方法
2. Exploiting Unlabeled Data for Neural Grammatical Error Detection [J] . Zhuo-Ran Liu, Yang Liu 计算机科学技术学报（英文版） . 2017,第004期

机译：利用未标记的数据进行神经语法错误检测
3. Synthetic data with neural machine translation for automatic correction in arabic grammar [J] . Aiman Solyman, Wang Zhenyu, Tao Qian, Egyptian Informatics Journal . 2021,第3期

机译：具有神经电机翻译的合成数据用于阿拉伯语语法自动校正
4. Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data [C] . Roman Grundkiewicz, Marcin Junczys-Dowmunt, Kenneth Heafield Workshop on innovative use of NLP for building educational applications . 2019

机译：具有无监督综合数据预测预测的神经语法纠错系统
5. Calculation of variance and covariance of sampling errors in complex mineral processing systems and correction of these errors by using data reconciliation [D] . Mirabedini, Azar. 1996

机译：计算复杂矿物处理系统中采样误差的方差和协方差，并使用数据对账纠正这些误差
6. Correction of data errors and reanalysis of The effect ofglucomannan on body weight in overweight or obese children and adults: Asystematic review of randomized controlled trials [O] . David B. Allison, Bartłomiej M. Zalewski, Anna Chmielewska, -1

机译：纠正数据错误并重新分析葡甘露聚糖对超重或肥胖儿童和成人体重的影响：随机对照试验的系统评价
7. A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning [O] . Yo Joong Choe, Jiyeon Ham, Kyubyong Park, 2019

机译：基于更好的预训练和顺序转移学习的神经语法纠错系统

Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅