首页> 外文期刊>ACM transactions on multimedia computing communications and applications >RCE-HIL: Recognizing Cross-media Entailment with Heterogeneous Interactive Learning
【24h】

RCE-HIL: Recognizing Cross-media Entailment with Heterogeneous Interactive Learning

机译:RCE-HIL:识别与异构互动学习的跨媒体意见

获取原文
获取原文并翻译 | 示例

摘要

Entailment recognition is an important paradigm of reasoning that judges if a hypothesis can be inferred from given premises. However, previous efforts mainly concentrate on text-based reasoning as recognizing textual entailment (RTE), where the hypotheses and premises are both textual. In fact, humans' reasoning process has the characteristic of cross-media reasoning. It is naturally based on the joint inference with different sensory organs, which represent complementary reasoning cues from unique perspectives as language, vision, and audition. How to realize cross-media reasoning has been a significant challenge to achieve the breakthrough for width and depth of entailment recognition. Therefore, this article extends RTE to a novel reasoning paradigm: recognizing cross-media entailment (RCE), and proposes heterogeneous interactive learning (HIL) approach. Specifically, HIL recognizes entailment relationships via cross-media joint inference, from image-text premises to text hypotheses. It is an end-to-end architecture with two parts: (1) Cross-media hybrid embedding is proposed to perform cross embedding of premises and hypotheses for generating their fine-grained representations. It aims to achieve the alignment of cross-media inference cues via image-text and text-text interactive attention. (2) Heterogeneous joint inference is proposed to construct a heterogeneous interaction tensor space and extract semantic features for entailment recognition. It aims to simultaneously capture the interaction between cross-media premises and hypotheses and distinguish their entailment relationships. Experimental results on widely used Stanford natural language inference (SNLI) dataset with image premises from Flickr30K dataset verify the effectiveness of HIL and the intrinsic intermedia complementarity in reasoning.
机译:有关识别的重要范例是判断如果可以从给定的房地推断出一个假设。然而,以前的努力主要集中在识别文本的推理中,作为识别文本意外(RTE),其中假设和场所都是文本。事实上,人类的推理过程具有跨媒体推理的特征。它自然基于与不同感官器官的关节推断,这代表了从独特的观点作为语言,视觉和试镜的互补推理线索。如何实现跨媒体推理一直是一个重要的挑战,以实现宽度和仰光识别的突破。因此,本文将RTE扩展到新颖的推理范式:识别跨媒体征集(RCE),并提出异构互动学习(HIL)方法。具体而言,HIL通过跨媒体联合推论从图像文本房屋到文本假设来识别出具有的征集关系。它是具有两部分的端到端架构:(1)提出跨媒混合嵌入,以执行跨嵌入房屋的交叉嵌入和假设,以产生它们的细粒度表示。它旨在通过图像文本和文本互动关注来实现跨媒体推理提示的对齐。 (2)提出了异构联合推理,以构建异构相互作用张量空间,提取语义特征以进行征兆。它旨在同时捕捉交叉媒体场所和假设之间的互动,并区分他们的着名关系。来自Flickr30k Dataset的图像场所广泛使用的斯坦福自然语言推理(SNLI)数据集的实验结果验证了HIL和内在介质互补性在推理中的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号