首页> 外文OA文献 >Evaluation corpus for restricted-domain question-answering systems for the holy Quran
【2h】

Evaluation corpus for restricted-domain question-answering systems for the holy Quran

机译:针对神圣古兰经的限制域问答系统的评估语料库

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents the compilation of a corpus of question-answer pairs for the holy Quran. The corpus has been manually collected from a wide range of sources, and designed to represent the Quran Arabic-English Question and Answer Corpus (QAEQ&AC). QAEQ&AC is a written, bilingual corpus, which comprises Arabic and English text. First, question-answer pairs have been collected from several trusted expert sources. Then the data were merged and cleaned using Microsoft Excel. After that data were converted to the format that suitable for mining tools, where we have created a comma-separated value (CSV) file form at. The corpus obtained consists of more than 1500 question-answer pairs which is nearly 50.000 words, divided over Arabic and English languages. It includes different question types such as what, when, why, etc., and different answer length. We anticipate that the current and subsequent versions of our corpus will be a valuable evaluation resource for computational linguists investigating Quran question and answer; it might be used as a gold standard in researches, that dealing with natural language processing, information retrieval, artificial intelligence. The corpus can be subjected to an annotation to derive linguistic information such as morphological, syntactic, semantic, and lexical information.
机译:本文介绍了针对古兰经的问答对语料库的汇编。语料库已从多种来源手动收集,旨在代表古兰经阿拉伯语-英语问答库(QAEQ&AC)。 QAEQ&AC是书面的双语语料库,包括阿拉伯文和英文文本。首先,已经从几个值得信赖的专家来源收集了问题答案对。然后使用Microsoft Excel合并并清除数据。在将数据转换为适合采矿工具的格式之后,我们在处创建了逗号分隔值(CSV)文件格式。所获得的语料库由1500多个问答对组成,这些对将近50.000个单词,分为阿拉伯语和英语。它包括不同的问题类型,例如什么,何时,为什么等等,以及不同的答案长度。我们期望,我们的语料库的当前版本和后续版本将成为计算语言学家调查古兰经问答的宝贵评估资源。它可能被用作研究自然语言处理,信息检索,人工智能的黄金标准。可以对语料库进行注释以导出语言信息,例如形态,句法,语义和词汇信息。

著录项

  • 作者

    Hamoud B; Atwell E;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号