Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

Wei Liu; Xianglin Huang; Gang Cao; Jianglong Zhang; Gege Song; Lifang Yang

首页> 外文期刊>Multimedia Tools and Applications >Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

【24h】

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

机译：微型视频场地分类门控全卷积块的多模态序列模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the large amount of micro-videos available in social network applications, micro-video venue category provides extremely valuable venue information that assists location-oriented applications, personalized services, etc. In this paper, we formulate micro-video venue classification as a multi-modal sequential modeling problem. Unlike existing approaches that use long short-term memory (LSTM) models to capture temporal patterns for micro-video, we propose multi-modality sequence model with gated fully convolutional blocks. Specifically, we firstly adopt three parallel gated fully convolutional blocks to extract spa-tiotemporal features from visual, acoustic and textual modalities of micro-videos. Then, an additional gated fully convolutional block is used to fuse such three modalities of spa-tiotemporal features. Finally, corresponding prototype is simultaneously learned to improve the robustness against softmax classification function. Extensive experimental results on a real-world benchmark dataset demonstrate the effectiveness of our model in terms of both Micro-F and Macro-F scores.

机译：通过社交网络应用中提供的大量微观视频，微型视频场地类别提供了极其有价值的场地信息，可以帮助面向位置的应用，个性化服务等。在本文中，我们将微型视频场地分类标准为多个 - 阳极顺序建模问题。与使用长短期内存（LSTM）模型的现有方法不同，以捕获微型视频的时间模式，我们提出了具有门控完全卷积块的多模态序列模型。具体而言，我们首先采用了三个平行门控完全卷积块，以从微观视频的视觉，声学和文本方式中提取水疗功能。然后，使用额外的门控完全卷积块来熔断Spa-Tibporal特征的这种三种方式。最后，同时学习相应的原型，以提高对软MAX分类功能的鲁棒性。在实际基准数据集上的广泛实验结果展示了我们模型的有效性，而是在微型和宏-F分数方面。

著录项

来源
《Multimedia Tools and Applications》 |2020年第10期|6709-6726|共18页
作者
Wei Liu; Xianglin Huang; Gang Cao; Jianglong Zhang; Gege Song; Lifang Yang;
展开▼
作者单位

School of Computer Science and Cybersecurity Communication University of China Beijing China Nanyang Institute of Technology Nanyang China;

School of Computer Science and Cybersecurity Communication University of China Beijing China;

School of Computer Science and Cybersecurity Communication University of China Beijing China;

State Grid Fujian Information and Telecommunication Company Fuzhou China;

School of Computer Science and Cybersecurity Communication University of China Beijing China;

School of Computer Science and Cybersecurity Communication University of China Beijing China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Micro-video venue classification; Gated fully convolutional block; Multi-modal sequence model; Prototype learning;

机译：微视频场地分类;门控完全卷积块;多模态序列模型;原型学习;

相似文献

外文文献
中文文献
专利

1. Cross-modal context-gated convolution for multi-modal sentiment analysis [J] . Wen Huanglu, You Shaodi, Fu Ying Pattern recognition letters . 2021,第Juna期

机译：多模态情绪分析的跨模型上下文门控卷积
2. Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition [J] . Guo Jie, Nie Xiushan, Yin Yilong Quality Control, Transactions . 2020,第期

机译：互补性：微型视频场景识别的多模态增强语义学习
3. pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks [J] . Budach Stefan, Marsico Annalisa Bioinformatics . 2018,第17期

机译：Pysster：通过卷积神经网络学习序列和结构图案来分类生物序列
4. Temporal Attention-Gated Model for Robust Sequence Classification [C] . Wenjie Pei, Tadas Baltrušaitis, David M. J. Tax, IEEE Conference on Computer Vision and Pattern Recognition . 2017

机译：鲁棒序列分类的时间注意门控模型
5. A Convolutional Neural Network-based Approach to Personalized 3D Modeling of the Human Body and Its Classification [D] . ?Basu, Semanti 2020

机译：基于卷积神经网络的人体个性化3D建模方法及其分类
6. pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks [O] . Stefan Budach, Annalisa Marsico -1

机译：pysster：通过卷积神经网络学习序列和结构基序对生物序列进行分类
7. Joint Learning of NNeXtVLAD, CNN and Context Gating for Micro-Video Venue Classification [O] . Wei Liu, Xianglin Huang, Gang Cao, 2019

机译：微型视频场地分类的NnextVlad，CNN和上下文衔接的联合学习

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

摘要

著录项

相似文献

相关主题

期刊订阅