DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension

机译：Duorc：以复杂的语言理解为争取读写阅读理解

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie - one from Wikipedia and the other from IMDb - written by two different authors. We asked crowdsourced workers to create questions from one version of the plot and a different set of workers to extract or synthesize answers from the other version. This unique characteristic of DuoRC where questions and answers are created from different versions of a document narrating the same underlying story, ensures by design, that there is very little lexical overlap between the questions created from one version and the segments containing the answer in the other version. Further, since the two versions have different levels of plot detail, narration style, vocabulary. etc., answering questions from the second version requires deeper language understanding and incorporating external background knowledge. Additionally, the narrative style of passages arising from movie plots (as opposed to typical descriptive passages in existing datasets) exhibits the need to perform complex reasoning over events across multiple sentences. Indeed, we observe that state-of-the-art neural RC models which have achieved near human performance on the SQuAD dataset (Rajpurkar et al., 2016b), even when coupled with traditional NLP techniques to address the challenges presented in DuoRC exhibit very poor performance (F1 score of 37.42% on DuoRC v/s 86% on SQuAD dataset). This opens up several interesting research avenues wherein DuoRC could complement other RC datasets to explore novel neural approaches for studying language understanding.

机译：我们建议DuoRC，一个新的数据集阅读理解（RC），对语言的理解超出现有的RC数据集所提供的那些神经的方法能够激励一些新的挑战。 DuoRC包含7680个对电影情节，每一对集合中反映了同一部电影的两个版本的集合创建186089独特的问答配对 - 一个来自维基百科和从IMDB其他 - 由两个不同的作者写成。我们问众包工人创造从情节的一个版本，并与其他版本不同的一组工人提取或合成的答案的问题。其中，从不同版本的文件的叙述相同的底层故事，可确保设计，不存在从一个版本中创建的问题和包含在其他的答案段之间很少词汇重叠创建的问题和答案DuoRC的这种独特的特征版本。此外，由于这两个版本有不同程度的详细情节，叙事风格，词汇。等，从第二个版本回答问题需要更深层次的语言理解和整合外部的背景知识。此外，从电影情节所产生的（如在现有的数据集，而不是典型的描述性段落）通道的叙事风格呈现出过度跨越多个句子事件进行复杂的推理的需要。事实上，我们观察这对球队数据集附近的人的表现实现了国家的最先进的神经RC模型（Rajpurkar等，2016B），即使与传统的NLP技术来解决DuoRC提出的挑战，再加表现出非常性能差（F1得分的37.42％的DuoRC v / S上小队数据集86％）。这开辟了一些有趣的研究途径，其中DuoRC能补充其他RC的数据集，探讨学习语言理解新的神经的方法。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2018年|lxx p. 1372-2060|共11页
会议地点
作者
Amrita Saha; Rahul Aralikatte; Mitesh M. Khapra; Karthik Sankaranarayanan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. The Effects of a Paraphrasing and Text Structure Intervention on the Main Idea Generation and Reading Comprehension of Students with Reading Disabilities in Grades 4 and 5 [J] . Stevens Elizabeth A., Vaughn Sharon, House Lexy, Scientific studies of reading . 2020,第5期

机译：释义和文本结构干预对4和5年级阅读残疾学生的主要思想生成和阅读理解的影响
2. Developmental changes in the nature of language proficiency and reading fluency paint a more complex view of reading comprehension in ELL and EL1 [J] . Esther Geva, Fataneh Farnia Reading and Writing . 2012,第8期

机译：语言能力和阅读流利性的发展变化描绘了ELL和EL1中阅读理解的更复杂观点
3. Early Language Competencies and Advanced Measures of Mental State Understanding Are Differently Related to Listening and Reading Comprehension in Early Adolescence [J] . Susanne Ebert Frontiers in Psychology . 2020,第a期

机译：早期语言能力和精神状态理解的先进措施与早期青春期的倾听和阅读理解不同
4. DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension [C] . Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Annual meeting of the Association for Computational Linguistics . 2018

机译：DuoRC：通过释义阅读来实现对复杂语言的理解
5. THE RELATIVE IMPORTANCE OF CONTENT WORDS AND FUNCTION WORDS AS RELATED TO SYNTACTIC COMPLEXITY, ENGLISH PROFICIENCY AND FIRST LANGUAGE TRANSFER IN THE READING COMPREHENSION OF ENGLISH AS A SECOND LANGUAGE (ESL) LEARNERS (ADULT, SPANISH, ARABIC). [D] . LAM, AGNES SHUN-LING. 1984

机译：在英语作为第二语言（ESL）学习者（成人，西班牙语，阿拉伯语）的阅读理解中，与句法复杂性，英语熟练度和第一语言迁移相关的内容单词和功能单词的相对重要性。
6. Early Language Competencies and Advanced Measures of Mental State Understanding Are Differently Related to Listening and Reading Comprehension in Early Adolescence [O] . Susanne Ebert 2020

机译：早期语言能力和高级精神状态理解的措施与早期青春期的倾听和阅读理解有不同关系
7. DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension [O] . Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, 2018

机译：Duorc：以复杂的语言理解为争取读写阅读理解
8. Do First Language Writing and Second Language Reading Equal Second LanguageReading Comprehension. An Assessment Dilemma [R] . Brisbois, J. E. 1992

机译：做第一语言写作和第二语言阅读等于第二语言阅读理解。评估困境

DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension

摘要

著录项

相似文献

相关主题

期刊订阅