首页> 美国卫生研究院文献>Data in Brief >A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking
【2h】

A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking

机译:A7׳ta:单语阿拉伯语平行语料库中的数据用于语法检查

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Grammar error correction can be considered as a “translation” problem, such that an erroneous sentence is “translated” into a correct version of the sentence in the same language. This can be accomplished by employing techniques like Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Producing models for SMT or NMT for the goal of grammar correction requires monolingual parallel corpora of a certain language.This data article presents a monolingual parallel corpus of Arabic text called A7׳ta (). It contains 470 erroneous sentences and their 470 error-free counterparts. This is an Arabic parallel corpus that can be used as a linguistic resource for Arabic natural language processing (NLP) mainly to train sequence-to-sequence models for grammar checking. Sentences were manually collected from a book that has been prepared as a guide for correctly writing and using Arabic grammar and other linguistic features. Although there are a number of available Arabic corpora of errors and corrections [2] such as QALB [10] and Arabic Learner Corpus [11], the data we present in this article is an effort to increase the number of freely available Arabic corpora of errors and corrections by providing a detailed error specification and leveraging the work of language experts.
机译:语法错误纠正可以被认为是“翻译”问题,因此将错误的句子“翻译”为相同语言的句子的正确版本。这可以通过采用统计机器翻译(SMT)或神经机器翻译(NMT)之类的技术来实现。为语法更正的目的而为SMT或NMT生产模型需要某种语言的单语平行语料库。此数据文章提出了一种阿拉伯语文本的单语平行语料库,称为A7׳ta()。它包含470个错误句子及其470个无错对应内容。这是一个阿拉伯语平行语料库,可以用作阿拉伯语自然语言处理(NLP)的语言资源,主要用于训练用于语法检查的序列到序列模型。句子是从一本书中手动收集的,该书已作为正确书写和使用阿拉伯文语法和其他语言功能的指南。尽管存在许多可用的阿拉伯语错误和更正[2]语料库,例如QALB [10]和阿拉伯语学习者语料库[11],但我们在本文中提供的数据是为增加免费提供的阿拉伯语语料库数量而做出的努力。通过提供详细的错误规范并利用语言专家的工作来进行错误和更正。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号