首页> 外文会议>International Conference on Natural Language Processing >The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction
【24h】

The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction

机译:WIKED错误语料库:纠正维基百科的语料库编辑及其在语法纠错的应用

获取原文

摘要

This paper introduces the freely available WikEd Error Corpus. We describe the data mining process from Wikipedia revision histories, corpus content and format. The corpus consists of more than 12 million sentences with a total of 14 million edits of various types. As one possible application, we show that WikEd can be successfully adapted to improve a strong baseline in a task of grammatical error correction for English-as-a-Second-Language (ESL) learners' writings by 2.63%. Used together with an ESL error corpus, a composed system gains 1.64% when compared to the ESL-trained system.
机译:本文介绍了可自由的WIKED错误语料库。我们描述了维基百科修订版历史,语料库内容和格式的数据挖掘过程。语料库由超过1200万句话组成,共有1400万种各种类型的编辑。作为一个可能的应用,我们表明,可以成功地调整Wiked,以改善语法纠错的任务中的强大基线,英语作为第二语言(ESL)学习者的作品的语法纠正措施2.63%。与ESL训练系统相比,与ESL错误语料库一起使用,组合系统增益1.64%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号