首页> 外文会议>Conference on empirical methods in natural language processing >Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
【24h】

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

机译:盔甲可以导电吗?开卷答题的新数据集

获取原文

摘要

We present a new kind of question answering dataset. OpenBookQA, modeled after open book exams for assessing human understanding of a subject. The open book that comes with our questions is a set of 1326 elementary level science facts. Roughly 6000 questions probe an understanding of these facts and their application to novel situations. This requires combining an open book fact (e.g., metals conduct electricity) with broad common knowledge (e.g., a suit of armor is made of metal) obtained from other sources. While existing QA datasets over documents or knowledge bases, being generally self-contained, focus on linguistic understanding, OpenBookQA probes a deeper understanding of both the topic-in the context of common knowledge-and the language it is expressed in. Human performance on OpenBookQA is close to 92%, but many state-of-thc-art prc-trained QA methods perform surprisingly poorly, worse than several simple neural baselines we develop. Our oracle experiments designed to circumvent the knowledge retrieval bottleneck demonstrate the value of both the open book and additional facts. We leave it as a challenge to solve the retrieval problem in this multi-hop setting and to close the large gap to human performance.
机译:我们提出了一种新的问题回答数据集。 OpenBookQA,以开卷考试为蓝本,用于评估人类对某个主题的理解。我们的问题附带的本打开的书是一组1326个基础科学事实。大约有6000个问题探讨了对这些事实的理解及其在新颖情况下的应用。这就需要结合从其他来源获得的公开书籍事实(例如,金属导电)和广泛的常识(例如,金属制成的装甲服)。现有的文档或知识库基础上的QA数据集通常是独立的,着重于语言理解,而OpenBookQA则对主题(在常识的上下文中)及其表达的语言进行了更深入的理解。接近92%,但许多最新的经过prc训练的QA方法的效果令人惊讶地差,比我们开发的几个简单的神经基线差。我们旨在规避知识检索瓶颈的预言性实验证明了已公开书籍和其他事实的价值。解决这一多跳环境中的检索问题并弥合人类绩效方面的巨大差距是我们面临的挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号