首页> 外文会议>Annual Meeting of the Association for Computational Linguistics;International Joint Conference on Natural Language Processing >A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections
【24h】

A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

机译:关于大型文件集合的问题联合文件和片段排名的神经模型

获取原文

摘要

Question answering (QA) systems for large document collections typically use pipelines that (i) retrieve possibly relevant documents, (ⅱ) re-rank them, (ⅲ) rank paragraphs or other snippets of the top-ranked documents, and (ⅳ) select spans of the top-ranked snippets as exact answers. Pipelines are conceptually simple, but errors propagate from one component to the next, without later components being able to revise earlier decisions. We present an architecture for joint document and snippet ranking, the two middle stages, which leverages the intuition that relevant documents have good snippets and good snippets come from relevant documents. The architecture is general and can be used with any neural text relevance ranker. We experiment with two main instantiations of the architecture, based on POSIT-DRMM (PDRMM) and a BKRT-based ranker.Experiments on biomedical data from BIOASQ show that our joint models vastly outperform the pipelines in snippet retrieval, the main goal for QA, with fewer trainable parameters, also remaining competitive in document retrieval. Furthermore, our joint PDRMM-based model is competitive with BERT-based models, despite using orders of magnitude fewer parameters. These claims are also supported by human evaluation on two test batches of BIOASQ. To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval. Our joint PDRMM-based model again outperforms the corresponding pipeline in snippet retrieval on the modified Natural Questions dataset, even though it performs worse than the pipeline in document retrieval. We make our code and the modified Natural Questions dataset publicly available.
机译:关于大型文件集合的问题答案(QA)系统通常使用管道(i)检索可能相关文件,(Ⅱ)重新排名,(Ⅲ)排名段落或排名级别的排名或其他片段,(ⅳ)选择作为精确的答案,排名排名第一的片段的跨度。管道在概念上简单,但错误从一个组件传播到下一个组件,而无需更高的组件能够修改前面的决策。我们展示了联合文件和片段排名的架构,这两个中间阶段,利用相关文件具有好的片段和好的片段来自相关文件的直觉。架构是通用的,可以与任何神经文本相关性排名符合使用。我们在基于DRMM(PDRMM)和基于BKRT的Ranker的基础上进行了两个主要实例化的结构。关于生物数据库的生物医学数据的考验表明,我们的联合模型在代段检索中大大超越了管道,QA的主要目标,具有较少的培训参数,在文件检索中也仍然存在竞争力。此外,尽管使用幅度较少的参数,但我们的联合PDRMM的模型与基于伯特的模型具有竞争力。这些权利要求也通过人体评估支持两种测试批次的Bioasq。要在另一个数据集上测试我们的主要发现,我们修改了数据集,以便它也可用于文档和代码段检索。我们的联合PDRMM的模型再次优于修改的自然问题数据集的代码段检索中的相应流水线,即使它比文档检索中的管道更糟糕。我们使我们的代码和被公开的DataSet进行了修改的自然问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号