首页> 外文会议>International conference on computational linguistics >Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension
【24h】

Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension

机译:应答否则:设计用于扩展机器阅读理解的数据集

获取原文

摘要

Machine reading comprehension (MRC) has recently attracted attention in the fields of natural language processing and machine learning. One of the problematic presumptions with current MRC technologies is that each question is assumed to be answerable by looking at a given text passage. However, to realize human-like language comprehension ability, a machine should also be able to distinguish not-answerable questions (NAQs) from answerable questions. To develop this functionality, a dataset incorporating hard-to-detect NAQs is vital; however, its manual construction would be expensive. This paper proposes a dataset creation method that alters an existing MRC dataset, the Stanford Question Answering Dataset, and describes the resulting dataset. The value of this dataset is likely to increase if each NAQ in the dataset is properly classified with the difficulty of identifying it as an NAQ. This difficulty level would allow researchers to evaluate a machine's NAQ detection performance more precisely. Therefore, we propose a method for automatically assigning difficulty level labels, which basically measures the similarity between a question and the target text passage. Our NAQ detection experiments demonstrate that the resulting dataset, having difficulty level annotations, is valid and potentially useful in the development of advanced MRC models.
机译:机器阅读理解(MRC)最近引起了自然语言处理和机器学习领域的关注。具有当前MRC技术的问题推定之一是通过查看给定的文本段落,假设每个问题都被认为是可疑的。然而,为了实现人类语言理解能力,机器还应该能够区分不可回回答的问题(NAQs)来自可应答的问题。要开发此功能,请结合难以检测的Naqs的数据集是至关重要的;然而,其手动施工将是昂贵的。本文提出了一个数据集创建方法,其改变了现有的MRC数据集,斯坦福问题应答数据集,并描述了生成的数据集。如果数据集中的每个NAQ正确分类,则该数据集的值可能会增加,以难以将其识别为NAQ。这种难度级别允许研究人员更准确地评估机器的NAQ检测性能。因此,我们提出了一种自动分配难度级别标签的方法,基本上测量问题与目标文本段之间的相似性。我们的NAQ检测实验表明,具有难度级别注释的生成的数据集在高级MRC模型的开发中有效且可能有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号