首页> 外文期刊>Multimedia Tools and Applications >Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification
【24h】

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

机译:微型视频场地分类门控全卷积块的多模态序列模型

获取原文
获取原文并翻译 | 示例
           

摘要

With the large amount of micro-videos available in social network applications, micro-video venue category provides extremely valuable venue information that assists location-oriented applications, personalized services, etc. In this paper, we formulate micro-video venue classification as a multi-modal sequential modeling problem. Unlike existing approaches that use long short-term memory (LSTM) models to capture temporal patterns for micro-video, we propose multi-modality sequence model with gated fully convolutional blocks. Specifically, we firstly adopt three parallel gated fully convolutional blocks to extract spa-tiotemporal features from visual, acoustic and textual modalities of micro-videos. Then, an additional gated fully convolutional block is used to fuse such three modalities of spa-tiotemporal features. Finally, corresponding prototype is simultaneously learned to improve the robustness against softmax classification function. Extensive experimental results on a real-world benchmark dataset demonstrate the effectiveness of our model in terms of both Micro-F and Macro-F scores.
机译:通过社交网络应用中提供的大量微观视频,微型视频场地类别提供了极其有价值的场地信息,可以帮助面向位置的应用,个性化服务等。在本文中,我们将微型视频场地分类标准为多个 - 阳极顺序建模问题。与使用长短期内存(LSTM)模型的现有方法不同,以捕获微型视频的时间模式,我们提出了具有门控完全卷积块的多模态序列模型。具体而言,我们首先采用了三个平行门控完全卷积块,以从微观视频的视觉,声学和文本方式中提取水疗功能。然后,使用额外的门控完全卷积块来熔断Spa-Tibporal特征的这种三种方式。最后,同时学习相应的原型,以提高对软MAX分类功能的鲁棒性。在实际基准数据集上的广泛实验结果展示了我们模型的有效性,而是在微型和宏-F分数方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号