首页> 外文会议>International Conference on Language Resources and Evaluation >Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language
【24h】

Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language

机译:作为第二语言的学习者语法纠错评估语料库

获取原文

摘要

The NAIST Lang-8 Learner Corpora (Lang-8 corpus) is one of the largest second-language learner corpora. The Lang-8 corpus is suitable as a training dataset for machine translation-based grammatical error correction systems. However, it is not suitable as an evaluation dataset because the corrected sentences sometimes include inappropriate sentences. Therefore, we created and released an evaluation corpus for correcting grammatical errors made by learners of Japanese as a Second Language (JSL). As our corpus has less noise and its annotation scheme reflects the characteristics of the dataset, it is ideal as an evaluation corpus for correcting grammatical errors in sentences written by JSL learners. In addition, we applied neural machine translation (NMT) and statistical machine translation (SMT) techniques to correct the grammar of the JSL learners' sentences and evaluated their results using our corpus. We also compared the performance of the NMT system with that of the SMT system.
机译:Naist Lang-8学习者(Lang-8语料库)是最大的第二语言学习者之一。 Lang-8语料库适用于基于机器的语法纠错系统的训练数据集。但是,它不适合作为评估数据集,因为纠正的句子有时包括不当句子。因此,我们创建并发布了评估语料库,以纠正日语学习者作为第二语言(JSL)所作的语法错误。由于我们的语料库具有较少的噪音,其注释方案反映了数据集的特征,因此是纠正JSL学习者编写的语法错误的评估语料库。此外,我们应用神经机翻译(NMT)和统计机器翻译(SMT)技术来纠正JSL学习者句子的语法,并使用我们的语料库评估其结果。我们还将NMT系统与SMT系统的性能进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号