【24h】

A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

机译:在研究论文中锚定的信息问题和答案的数据集

获取原文

摘要

Readers of academic research papers often read with the goal of answering specific questions. Question Answering systems that can answer those questions can make consumption of the content much more efficient. However, building such tools requires data that reflect the difficulty of the task arising from complex reasoning about claims made in multiple parts of a paper. In contrast, existing information-seeking question answering datasets usually contain questions about generic factoid-type information. We therefore present QASPER, a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. The questions are then answered by a separate set of NLP practitioners who also provide supporting evidence to answers. We find that existing models that do well on other QA tasks do not perform well on answering these questions, un-derperforming humans by at least 27 F_1 points when answering them from entire papers, motivating further research in document-grounded, information-seeking QA, which our dataset is designed to facilitate.
机译:学术研究论文的读者经常阅读目的是回答具体问题。问题接听系统可以回答这些问题可以使内容的消耗更有效。然而,建立此类工具需要数据反映从复杂原理所产生的任务难以在纸张的多个部分中所制作的主张所产生的任务。相比之下,现有的信息寻求应答数据集通常包含关于通用事件类型信息的问题。因此,我们呈现了Qasper,该数据集5,049个问题超过1,585个自然语言处理文件。每个问题都是由NLP从业者编写的,他们只读了相应论文的标题和摘要,问题旨在寻求全文中存在的信息。然后,问题由一套独立的NLP从业者回答,他们还提供支持证据来答案。我们发现,在其他QA任务上的现有模型,在回答这些问题时,不表现出这些问题,在从整个论文中回答它们时至少27个F_1点,激励进一步研究文档接地,信息寻求QA ,我们的数据集旨在方便。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号