首页> 外文会议>International conference on natural language processing >The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction
【24h】

The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction

机译:WikEd错误语料库:纠正性维基百科编辑语料库及其在语法错误纠正中的应用

获取原文

摘要

This paper introduces the freely available WikEd Error Corpus. We describe the data mining process from Wikipedia revision histories, corpus content and format. The corpus consists of more than 12 million sentences with a total of 14 million edits of various types. As one possible application, we show that WikEd can be successfully adapted to improve a strong baseline in a task of grammatical error correction for English-as-a-Second-Language (ESL) learners' writings by 2.63%. Used together with an ESL error corpus, a composed system gains 1.64% when compared to the ESL-trained system.
机译:本文介绍了免费提供的WikEd Error语料库。我们从Wikipedia修订历史,语料库内容和格式描述数据挖掘过程。语料库由超过1200万个句子组成,总共有1400万种不同类型的编辑。作为一种可能的应用,我们证明WikEd可以成功地适应英语作为第二语言(ESL)学习者写作的语法错误纠正任务,从而提高了坚实的基础。与ESL错误语料库一起使用,与ESL训练的系统相比,组合系统的收益为1.64%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号