首页> 外文会议>International Workshop on Content-Based Multimedia Indexing >Online multimodal matrix factorization for human action video indexing
【24h】

Online multimodal matrix factorization for human action video indexing

机译:在线多模态矩阵分解用于人体动作视频索引

获取原文

摘要

This paper addresses the problem of searching for videos containing instances of specific human actions. The proposed strategy builds a multimodal latent space representation where both visual content and annotations are simultaneously mapped. The hypothesis behind the method is that such a latent space yields better results when built from multiple data modalities. The semantic embedding is learned using matrix factorization through stochastic gradient descent, which makes it suitable to deal with large-scale collections. The method is evaluated on a large-scale human action video dataset with three modalities corresponding to action labels, action attributes and visual features. The evaluation is based on a query-by-example strategy, where a sample video is used as input to the system. A retrieved video is considered relevant if it contains an instance of the same human action present in the query. Experimental results show that the learned multimodal latent semantic representation produces improved performance when compared with an exclusively visual representation.
机译:本文解决了搜索包含特定人类行为实例的视频的问题。所提出的策略建立了多模态潜在空间表示,其中视觉内容和注释都被同时映射。该方法背后的假设是,当从多种数据模式构建时,这样的潜在空间会产生更好的结果。通过随机梯度下降使用矩阵分解来学习语义嵌入,这使其适合处理大规模集合。在具有三种与动作标签,动作属性和视觉特征相对应的模态的人类动作视频数据集上对该方法进行了评估。评估基于示例查询策略,其中将示例视频用作系统的输入。如果检索到的视频包含查询中存在的相同人工动作的实例,则认为该视频是相关的。实验结果表明,与专门的视觉表示相比,学习到的多峰潜在语义表示产生了更高的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号