首页> 外文会议>International conference on computational linguistics >Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension
【24h】

Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension

机译:是否回答:设计用于扩展机器阅读理解的数据集

获取原文

摘要

Machine reading comprehension (MRC) has recently attracted attention in the fields of natural language processing and machine learning. One of the problematic presumptions with current MRC technologies is that each question is assumed to be answerable by looking at a given text passage. However, to realize human-like language comprehension ability, a machine should also be able to distinguish not-answerable questions (NAQs) from answerable questions. To develop this functionality, a dataset incorporating hard-to-detect NAQs is vital; however, its manual construction would be expensive. This paper proposes a dataset creation method that alters an existing MRC dataset, the Stanford Question Answering Dataset, and describes the resulting dataset. The value of this dataset is likely to increase if each NAQ in the dataset is properly classified with the difficulty of identifying it as an NAQ. This difficulty level would allow researchers to evaluate a machine's NAQ detection performance more precisely. Therefore, we propose a method for automatically assigning difficulty level labels, which basically measures the similarity between a question and the target text passage. Our NAQ detection experiments demonstrate that the resulting dataset, having difficulty level annotations, is valid and potentially useful in the development of advanced MRC models.
机译:机器阅读理解(MRC)最近在自然语言处理和机器学习领域引起了关注。当前的MRC技术存在的一个有问题的假设是,通过查看给定的文本段落,可以假定每个问题都是可以回答的。但是,为了实现类人的语言理解能力,机器还应该能够将不可回答的问题(NAQ)与可回答的问题区分开。为了开发此功能,包含难以检测的NAQ的数据集至关重要。但是,它的手工构造会很昂贵。本文提出了一种数据集创建方法,该方法可以更改现有的MRC数据集(斯坦福问答数据集),并描述生成的数据集。如果数据集中的每个NAQ都经过适当分类而难以识别为NAQ,则此数据集的价值可能会增加。此难度级别使研究人员可以更准确地评估机器的NAQ检测性能。因此,我们提出了一种自动分配难度等级标签的方法,该方法基本上可以测量问题和目标文本段落之间的相似度。我们的NAQ检测实验表明,具有难度级别注释的结果数据集有效,并且在开发高级MRC模型中可能有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号