【24h】

A Framework for Indonesian Grammar Error Correction

机译:印度尼西亚语法纠错的框架

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC framework that has the potential to be a baseline method for Indonesian GEC tasks. This framework treats GEC as a multi-classification task. It integrates different language embedding models and deep learning models to correct 10 types of Part of Speech (POS) error in Indonesian text. In addition, we constructed an Indonesian corpus that can be utilized as an evaluation dataset for Indonesian GEC research. Our framework was evaluated on this dataset. Results showed that the Long Short-Term Memory model based on word-embedding achieved the best performance. Its overall macro-average F-0.5 in correcting 10 POS error types reached 0.551. Results also showed that the framework can be trained on a low-resource dataset.
机译:语法纠错(GEC)是自然语言处理研究中的挑战。虽然许多研究人员一直专注于GEC,如英语或中文,如英语或中文,少数研究专注于印度尼西亚,这是一种低资源语言。在本文中,我们提出了一个GEC框架,有可能成为印度尼西亚GEC任务的基线方法。此框架将GEC视为多分类任务。它集成了不同语言嵌入模型和深度学习模型,以纠正印度尼西亚文本中的10种类型的语音(POS)错误。此外,我们构建了一个可用于印度尼西亚GEC研究的评估数据集的印度尼西亚语料库。我们的框架是在此数据集上进行评估。结果表明,基于Word-EmbEdding的长短期内存模型实现了最佳性能。其整体宏观平均F-0.5校正10 POS错误类型达到0.551。结果还表明,框架可以在低资源数据集上培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号