首页> 外文期刊>Procedia Computer Science >DAWQAS: A Dataset for Arabic Why Question Answering System
【24h】

DAWQAS: A Dataset for Arabic Why Question Answering System

机译:DAWQAS:阿拉伯语为什么问答系统的数据集

获取原文
           

摘要

A why question answering system is a tool designed to answerwhy-questions posed in natural language. Several papers have been published on the problem of answeringwhy-questions. In particular, attempts have been made to analyze Arabic text and predict which passages are best candidates for thewhy-questions; employing different datasets with limited size and not publicly available. To overcome these limitations, this paper introduces the new publicly available dataset, DAWQAS: Dataset for ArabicWhyQuestion Answering System. It consists of 3205 ofwhyquestion-answer pairs that were first scraped from public Arabic websites, then texts were preprocessed and converted to feature vectors. Afterwards,why-answers were re-categorized based on their domains. Finally, the rhetorical relations’ probabilities based on discourse markers were computed for each sentence in the dataset. DAWQAS is a valuable resource for research and evaluation in language understanding. The new dataset is freely available athttps://github.com/masun/DAWQAS.
机译:为什么问答系统是一种旨在回答为什么以自然语言提出的问题的工具。关于回答问题的问题,已经发表了几篇论文。特别是,已经尝试分析阿拉伯语文本并预测哪些段落是为什么问题的最佳候选者。使用大小有限且无法公开获得的不同数据集。为了克服这些限制,本文介绍了新的公开可用数据集DAWQAS:阿拉伯语WhyQuestion应答系统的数据集。它由3205个疑问句-答案对组成,它们首先从阿拉伯语公共网站上刮取,然后对文本进行预处理并将其转换为特征向量。之后,为什么答案会根据其域进行重新分类。最后,针对数据集中的每个句子计算基于话语标记的修辞关系概率。 DAWQAS是用于语言理解研究和评估的宝贵资源。新数据集可从https://github.com/masun/DAWQAS免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号