首页> 外文OA文献 >A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking
【2h】

A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking

机译:A7'TA:语法检查的单声道阿拉伯语并行语料库的数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Grammar error correction can be considered as a “translation” problem, such that an erroneous sentence is “translated” into a correct version of the sentence in the same language. This can be accomplished by employing techniques like Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Producing models for SMT or NMT for the goal of grammar correction requires monolingual parallel corpora of a certain language.This data article presents a monolingual parallel corpus of Arabic text called A7׳ta (). It contains 470 erroneous sentences and their 470 error-free counterparts. This is an Arabic parallel corpus that can be used as a linguistic resource for Arabic natural language processing (NLP) mainly to train sequence-to-sequence models for grammar checking. Sentences were manually collected from a book that has been prepared as a guide for correctly writing and using Arabic grammar and other linguistic features. Although there are a number of available Arabic corpora of errors and corrections [2] such as QALB [10] and Arabic Learner Corpus [11], the data we present in this article is an effort to increase the number of freely available Arabic corpora of errors and corrections by providing a detailed error specification and leveraging the work of language experts. Keywords: Error checking, Arabic language, NLP, Parallel corpus
机译:语法错误校正可以被视为“翻译”问题,使得错误的句子是“翻译成”以相同语言的正确版本的句子。这可以通过采用统计机器翻译(SMT)或神经机翻译(NMT)等技术来实现的。为语法校正的目标产生SMT或NMT的模型需要一定语言的单声道并行语言。该数据文章呈现出称为A7'AT()的阿拉伯文文本的单声道并行语料库。它包含470个错误的句子及其470个无错误的对应物。这是一个阿拉伯语并行语料库,可以用作阿拉伯语自然语言处理(NLP)的语言资源,主要用于培训语法检查的序列到序列模型。从一本书中手动收集句子,该书被准备为正确写作和使用阿拉伯语语法和其他语言特征的指南。虽然有许多可用的错误和更正的阿拉伯数集团[2],如QALB [10]和阿拉伯语学习者语料库[11],我们在本文中提供的数据是努力增加自由可用的阿拉伯数集团的数量通过提供详细的错误规范并利用语言专家的工作来纠正。关键词:错误检查,阿拉伯语,NLP,并行语料库

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号