Learning Dense Representations of Phrases at Scale

机译：在规模上学习密集的短语表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on sparse representations and still underper-fonn retriever-reader approaches. In this work, we show for the first time that we can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA. We present an effective method to learn phrase representations from the supervision of reading comprehension tasks, coupled with novel negative sampling methods. We also propose a query-side line-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference. On five popular open-domain QA datasets, our model DensePhrases improves over previous phrase retrieval models by 15%- 25% absolute accuracy and matches the performance of state-of-the-art retriever-reader models. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs. Finally, we directly use our pre-indexed dense phrase representations for two slot filling tasks, showing the promise of utilizing DensePhrases as a dense knowledge base for downstream tasks.

机译：开放式域问题回答可以重新重新重新重新重新检索问题，无需在推理期间按需处理文档（SEO等，2019）。然而，当前短语检索模型严重依赖于稀疏表示，并且仍未欠下较低的鼠尾读者方法。在这项工作中，我们首次展示我们可以在开放式QA中学习在开放域QA中实现更强烈的表现的密集表现。我们提出了一种学习短语表示从阅读理解任务的监督的陈述的有效方法，与新颖的负抽样方法相结合。我们还提出了一个查询侧的线路调整策略，可以支持转移学习并降低培训和推理之间的差异。在五个流行的开放式QA数据集上，我们的模型DendePhrases通过前一词的检索模型提高了15％ - 25％的绝对精度，并符合最先进的检索器读者型号的性能。由于纯密集的表示，我们的模型很容易并行化，并在CPU上每秒处理超过10个问题。最后，我们直接使用我们的预索引密度的密度短语表示两个插槽填充任务，显示使用DendePhrase作为下游任务的密集知识库的承诺。

著录项

来源
《International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics》|2021年|6634-6647|共14页
会议地点
作者
Jinhyuk Lee; Mujeen Sung; Jaewoo Kang; Danqi Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities [J] . Du-Seong Chang, Key-Sun Choi Information Processing & Management . 2006,第3期

机译：基于提示短语和单词对概率的因果关系增量式提示短语学习和自举方法
2. Distributed Learning for Planning Under Uncertainty Problems with Heterogeneous Teams: Scaling Up the Multiagent Planning with Distributed Learning and Approximate Representations [J] . N. Kemal Ure, Girish Chowdhary, Yu Fan Chen, Journal of Intelligent & Robotic Systems: Theory & Application . 2014,第1a2期

机译：异构团队在不确定性问题下进行规划的分布式学习：利用分布式学习和近似表示来扩展多主体规划
3. National-scale greenhouse mapping for high spatial resolution remote sensing imagery using a dense object dual-task deep learning framework: A case study of China [J] . Ma Ailong, Chen Dingyuan, Zhong Yanfei, ISPRS Journal of Photogrammetry and Remote Sensing . 2021,第Nova期

机译：使用密集物体双任务深度学习框架的高空间分辨率遥感图像的全国规模温室映射 - 以中国为例
4. Enhancing Phrase-Based Statistical Machine Translation by Learning Phrase Representations Using Long Short-Term Memory Network [C] . Benyamin Ahmadnia, Bonnie J. Dorr International conference on recent advances in natural language processing . 2019

机译：通过使用长期短期记忆网络学习短语表示来增强基于短语的统计机器翻译
5. Learning Effective Binary Representation with Deep Hashing Technique for Large-Scale Multimedia Similarity Search [D] . Wu, Gengshen. 2020

机译：学习具有深度散列技术的有效二进制表示，用于大规模多媒体相似性搜索
6. A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning [O] . Bo-Wei Zhao, Zhu-Hong You, Lun Hu, 2021

机译：基于大规模图形表示学习预测药物 - 目标相互作用的一种新方法
7. Learning Source Phrase Representations for Neural Machine Translation [O] . Hongfei Xu, Josef van Genabith, Deyi Xiong, 2020

机译：神经机翻译的学习源短语表示

Learning Dense Representations of Phrases at Scale

摘要

著录项

相似文献

相关主题

期刊订阅