首页> 外文会议>International conference on text, speech and dialogue >A Picture Is Worth a Thousand Words: Towards Multimodal, Multilingual Context Models
【24h】

A Picture Is Worth a Thousand Words: Towards Multimodal, Multilingual Context Models

机译:图片值得一千个字:走向多模式,多语言上下文模型

获取原文

摘要

In Computational Linguistics, work towards understanding or generating language has been primarily based solely on textual information. However, when we humans process a text, be it written or spoken, we also take into account cues from the context in which such a text appears, in addition to our background and common sense knowledge. This is also the case when we translate text. For example, a news article will often contain images and may also contain a short video and/or audio clip. Users of social media often post photos and videos accompanied by short textual descriptions. The additional information can help minimise ambiguities and elicit unknown words. In this talk I will introduce a recent area of research that addresses the automatic translation of texts from rich context models that incorporate multimodal information, focusing on visual cues from images. I will cover some of our recent work analysing how humans perform translation in the presence/absence of visual cues and then move on to datasets and computational models proposed for this problem.
机译:在计算语言学中,致力于理解或生成语言的工作主要仅基于文本信息。但是,当我们人类处理文本(无论是书面还是口语)时,除了我们的背景知识和常识外,我们还考虑了出现此类文本的上下文中的提示。翻译文本时也是如此。例如,新闻文章通常包含图片,也可能包含简短的视频和/或音频剪辑。社交媒体的用户经常发布带有简短文字说明的照片和视频。附加信息可以帮助最大程度地减少歧义并引出未知单词。在本次演讲中,我将介绍一个最新的研究领域,该领域致力于解决包含多模式信息的丰富上下文模型中文本的自动翻译,重点是图像的视觉提示。我将介绍一些我们最近的工作,分析人类如何在存在或不存在视觉提示的情况下执行翻译,然后继续研究针对此问题提出的数据集和计算模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号