PubMedQA: A Dataset for Biomedical Research Question Answering

机译：PubMedQA：生物医学研究问答的数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce PubMedQA. a novel biomedical question answering (QA) dataset collected from PubMed abstracts. The task of PubMedQA is to answer research questions with yeso/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting!) using the corresponding abstracts. PubMedQA has 1k expert-annotated, 61.2k unlabeled and 211.3k artificially generated QA instances. Each PubMedQA instance is composed of (1) a question which is either an existing research article title or derived from one. (2) a context which is the corresponding abstract without its conclusion, (3) a long answer, which is the conclusion of the abstract and. presumably, answers the research question, and (4) a yeso/maybe answer which summarizes the conclusion. PubMedQA is the first QA dataset where reasoning over biomedical research texts, especially their quantitative contents, is required to answer the questions. Our best performing model, multi-phase fine-tuning of BioBERT with long answer bag-of-word statistics as additional supervision, achieves 68.1% accuracy, compared to single human performance of 78.0% accuracy and majority-baseline of 55.2% accuracy, leaving much room for improvement.

机译：我们介绍PubMedQA。从PubMed摘要中收集的新型生物医学问答（QA）数据集。 PubMedQA的任务是使用相应的摘要以是/否/也许回答研究问题（例如：术前他汀类药物是否能减少冠状动脉搭桥术后的房颤！）。 PubMedQA具有1k专家注释，61.2k未标记和211.3k人工生成的QA实例。每个PubMedQA实例由（1）个问题组成，该问题可以是现有研究文章的标题，也可以是从中衍生出来的。（2）上下文是没有其结论的相应摘要，（3）长答案，即摘要和的结论。大概回答了研究问题，并且（4）是/否/也许是回答，总结了结论。 PubMedQA是第一个QA数据集，需要对生物医学研究文本进行推理，尤其是其定量内容才能回答问题。我们性能最佳的模型是BioBERT的多阶段微调，具有长答案词袋统计作为额外的监控，可实现68.1％的准确性，相比之下，单个人的绩效为78.0％的准确性和多数基准为55.2％的准确性，因此有很大的改进空间。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|2567-2577|共11页
会议地点
作者
Qiao Jin; Bhuwan Dhingra; Zhengping Liu; William W. Cohen; Xinghua Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. 真实网络数据集自动问答系统中的问题分类 [J] . 袁晓洁, 于士涛, 师建兴, 东南大学学报（英文版） . 2008,第003期
2. SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions [J] . Sarrouti Mourad, Ouatik El Alaoui Said Artificial intelligence in medicine . 2020,第Jana期

机译：SemBioNLQA：一种语义生物医学问题解答系统，用于检索对自然语言问题的准确和理想答案
3. Visual question answering: Datasets, algorithms, and future challenges [J] . Kushal Kafle, Christopher Kanan Computer vision and image understanding . 2017,第octa期

机译：视觉问题解答：数据集，算法和未来挑战
4. Visual question answering: A survey of methods and datasets [J] . Qi Wu, Damien Teney, Peng Wang, Computer vision and image understanding . 2017,第octa期

机译：视觉问题解答：方法和数据集调查
5. PubMedQA: A Dataset for Biomedical Research Question Answering [C] . Qiao Jin, Bhuwan Dhingra, Zhengping Liu, International joint conference on natural language processing . 2019

机译：PubMedQA：用于生物医学研究问题的数据集
6. The jikitou biomedical question answering system: Facilitating the next stage in the evolution of information retrieval. [D] . Bauer, Michael Anton. 2013

机译：jikitou生物医学问答系统：促进信息检索发展的下一阶段。
7. Question Processing and Clustering in INDOC: A Biomedical Question Answering System [O] . Parikshit Sondhi, Purushottam Raj, V Vinod Kumar, 2007

机译：INDOC中的问题处理和聚类：生物医学问题解答系统
8. Semantic role labeling tools for biomedical question answering: a study of selected tools on the BioASQ datasets [O] . Fabian Eckert, Mariana Neves 2018

机译：用于生物医学问题的语义角色标记工具应答：对Bioasq数据集的选定工具的研究
9. Questions and Answers on Quality, the ISO 9000 Standard Series, Quality SystemRegistration, and Related Issues. More Questions and Answers on the ISO 9000 Standard Series and Related Issues [R] . Breitenberg, M. 1993

机译：有关质量的问题和解答，IsO 9000标准系列，质量体系注册和相关问题。有关IsO 9000标准系列及相关问题的更多问题和解答

PubMedQA: A Dataset for Biomedical Research Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅