首页> 外文会议>Conference on empirical methods in natural language processing >How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks
【24h】

How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks

机译:阅读理解需要多少阅读?流行基准的批判性调查

获取原文

摘要

Many recent papers address reading comprehension, where examples consist of (question, passage, answer) tuples. Presumably, a model must combine information from both questions and passages to predict corresponding answers. However, despite intense interest in the topic, with hundreds of published papers vying for leaderboard dominance, basic questions about the difficulty of many popular benchmarks remain unanswered. In this paper, we establish sensible baselines for the bAbI, SQuAD, CBT, CNN, and Who-did-What datasets, finding that question- and passage-only models often perform surprisingly well. On 14 out of 20 bAbI tasks, passage-only models achieve greater than 50% accuracy, sometimes matching the full model. Interestingly, while CBT provides 20-senlence passages, only the last is needed for comparably accurate prediction. By comparison. SQuAD and CNN appear better-constructed.
机译:最近的许多论文都涉及阅读理解,其中的例子包括(问题,段落,答案)元组。大概,模型必须结合问题和段落中的信息来预测相应的答案。但是,尽管对该主题引起了浓厚的兴趣,尽管数百篇已发表的论文争夺排行榜的霸主地位,但有关许多流行基准测试的难度的基本问题仍未得到解答。在本文中,我们为bAbI,SQuAD,CBT,CNN和Who-did-What数据集建立了合理的基线,发现仅问题和仅通过的模型通常表现出令人惊讶的良好表现。在20个bAbI任务中的14个中,仅通过模型的准确性达到50%以上,有时与完整模型匹配。有趣的是,尽管CBT提供20句通过,但仅需最后一个就可以进行相对准确的预测。通过对比。 SQuAD和CNN看起来结构更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号