Visual Grounding Strategies for Text-Only Natural Language Processing

机译：仅限文本自然语言处理的视觉接地策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual grounding is a promising path toward more robust and accurate Natural Language Processing (NLP) models. Many multi-modal extensions of BERT (e.g., VideoBERT, LXMERT, VL-BERT) allow a joint modeling of texts and images that lead to state-of-the-art results on multimodal tasks such as Visual Question Answering. Here, we leverage multimodal modeling for purely textual tasks (language modeling and classification) with the expectation that the multimodal pretraining provides a grounding that can improve text processing accuracy. We propose possible strategies in this respect. A first type of strategy, referred to as transferred grounding consists in applying multimodal models to text-only tasks using a placeholder to replace image input. The second one, which we call associative grounding, harnesses image retrieval to match texts with related images during both pretraining and text-only downstream tasks. We draw further distinctions into both strategies and then compare them according to their impact on language modeling and commonsense-related downstream tasks, showing improvement over text-only baselines.

机译：视觉接地是一个有希望的途径，旨在更强大，更准确的自然语言处理（NLP）模型。 BERT的许多多模态扩展（例如，Videobert，LXMERT，VL-BERT）允许联合建模文本和图像，这些文本和图像导致最先进的结果，例如视觉问题的多模式任务。在这里，我们利用多模型建模以获得纯文本任务（语言建模和分类），期望多模式预介质提供可以提高文本处理精度的接地。我们提出了这方面的可能策略。第一种类型的策略，称为转移接地包括使用占位符替换图像输入的仅文本任务应用多模式模型。我们调用关联接地的第二个，利用图像检索来匹配在预先预测和仅文本下游任务期间与相关图像的文本匹配。我们将进一步区别进一步分为两种策略，然后根据其对语言建模和与致命相关的下游任务的影响进行比较，显示出对仅文本基线的改进。

著录项

来源
《Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge》|2021年|19-29|共11页
会议地点
作者
Damien Sileo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Language Processing as Cue Integration: Grounding the Psychology of Language in Perception and Neurophysiology [J] . Andrea E. Martin Frontiers in Psychology . 2016,第1期

机译：作为提示整合的语言处理：在感知和神经生理学中扎根语言心理学
2. Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition [J] . Feichen Shen, Sijia Liu, Sunyang Fu, JMIR Medical Informatics . 2021,第1期

机译：使用自然语言处理的综合临床叙事的家庭历史提取：概述和评估2019年国家NLP临床挑战（N2C2）/开放式健康自然语言处理（OHNLP）竞争的挑战数据集和解决方案
3. Successful Mining of Important Data from Patient Archives with Natural Language Processing Commentary on an article by Caroline P. Thirukumaran, MBBS, MHA, PhD, et al.: "Natural Language Processing for the Identification of Surgical Site Infections in Orthopaedics" [J] . Malizos Konstantinos N. The Journal of Bone and Joint Surgery. American Volume . 2019,第24期

机译：从患者档案中成功挖掘患者档案与自然语言处理评论，由Caroline P. Thirukumaran，MBBS，MHA，Phd等，M：“用于鉴定骨科手术部位感染的自然语言处理”
4. Grounding natural spoken language semantics in visual perception and motor control [C] . Deb Roy, Kai-Yuh Hsiao, Peter Gorniak, AAAI Symposium on Human-Robot Interaction . 2002

机译：基于视觉感知和电机控制的自然口语语义
5. Grounding robot motion in natural language and visual perception. [D] . Bronikowski, Scott Alan. 2016

机译：以自然语言和视觉感知来使机器人运动接地。
6. Language Processing as Cue Integration: Grounding the Psychology of Language in Perception and Neurophysiology [O] . Andrea E. Martin -1

机译：作为提示整合的语言处理：在感知和神经生理学中扎根语言心理学
7. Robotics, Grounding and Natural Language Processing [O] . Daichi Mochihashi 2020

机译：机器人，接地和自然语言处理

Visual Grounding Strategies for Text-Only Natural Language Processing

摘要

著录项

相似文献

相关主题

期刊订阅