首页> 外文会议>International Joint Conference on Neural Networks >M3LA: A Novel Approach Based on Encoder-Decoder with Attention Framework for Multi-modal Multi-label Learning
【24h】

M3LA: A Novel Approach Based on Encoder-Decoder with Attention Framework for Multi-modal Multi-label Learning

机译:M3LA:一种基于编码器-解码器和注意力框架的多模式多标签学习新方法

获取原文

摘要

With the exponential growth of digital multimedia resources, in the real-world, most of the data are represented as a multi-modal form and usually with multiple semantic labels. Nowadays, Multi-modal Multi-label learning has become a very hot topic. However, previous methods either have not considered the relation between modalities and labels or the correlation among labels. In this paper, we considered the following three questions: (1) How to model the correlation among labels? (2) Is there a correlation between modality and label? (3) Whether the modal input order affects the prediction of individual instance, and how to find the most appropriate modal input sequence for each instance? To solve above problems, we proposed a novel method for Multi-modal Multi-label learning(MMML), which based on Encoder-Decoder with attention framwork named MMML-Attention(M3LA). The M3LA takes into account all of these issues. Specifically, benefit from the Encoder-Decoder with attention structure, on the one hand, M3LA can model the relation between modalities and labels. On the other hand, we introduce a correlation matrix to learn the correlation among labels, which can be obtained as parameter through the training process. It should be mentioned that label prediction occurs at every step of the decoder, and the prediction of the label is constantly corrected and then the most accurate prediction is obtained. To validate the effectiveness of the proposed method, we expermiented on widely used several benchmark datasets and compared with state-of-art approaches.
机译:随着数字多媒体资源的指数增长,在现实世界中,大多数数据都表示为多模态形式,通常具有多个语义标签。如今,多模态多标签学习已成为一个非常热门的话题。然而,以前的方法还没有考虑模态和标签之间的关系或标签之间的相关性。在本文中,我们考虑了以下三个问题:(1)如何建模标签之间的相关性? (2)模态和标签之间是否存在相关性? (3)模态输入顺序是否会影响各个实例的预测,以及如何找到每个实例的最合适的模态输入序列?为了解决上述问题,我们提出了一种用于多模态多标签学习(MMML)的新方法,基于编码器 - 解码器具有名为MMML-Inctions的注意力框架(M3LA)。 M3LA考虑到所有这些问题。具体而言,从一个注意结构中受益于注意结构,一方面,M3LA可以模拟模态和标签之间的关系。另一方面,我们引入相关矩阵来学习标签之间的相关性,这可以通过训练过程获得作为参数。应该提到的是,在解码器的每个步骤中发生标签预测,并且不断校正标签的预测,然后获得最精确的预测。为了验证所提出的方法的有效性,我们介绍了广泛使用的几个基准数据集,并与最先进的方法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号