首页> 外文期刊>Multimedia Tools and Applications >Temporal video scene segmentation using deep-learning
【24h】

Temporal video scene segmentation using deep-learning

机译:使用深度学习的时间视频场景分割

获取原文
获取原文并翻译 | 示例
       

摘要

The automatic temporal video scene segmentation (also known as video story segmentation) is still an open problem without definite solutions in most cases. Among the available techniques, the ones which shows better results are multimodal using features extracted from multiple modalities. Multimodal fusion may be performed to fuse each modality as a single representation (early fusion) or by each modality segmentation (late fusion), the latter been widely due to multimodal fusion simplicity. Recently, deep learning techniques such as convolutional neural networks (CNN) has been successfully employed to extract features from multiple data sources, easing the development of early fusion methods. However, CNNs cannot adequately learn cues which are temporally distributed along the video due to difficulties to model temporal features data dependencies. A particular deep learning approach which can learn such cues is the recurrent neural network (RNN). Successfully employed on text processing, RNNs are fitted to analyze sequences of data of variable length and may better grasp the temporal relationship among low-level features of video segments, hopefully obtaining more accurate scene boundary detection. This paper goes beyond direct applying RNNs and proposes a new multimodal approach to temporally segment a video into scenes. This approach builds a new architecture carefully combining CNN and RNN capabilities, obtaining better efficacy results on the task when compared with related techniques on a public video dataset.
机译:在大多数情况下,自动时间视频场景分割(也称为视频故事分段)仍然是一个没有明确解决方案的打开问题。在可用技术中,显示出更好的结果的方法是使用从多种方式中提取的特征的多模式。可以执行多模式融合以使每个模态融合为单个表示(早期融合)或通过每个模态分割(晚期融合),后者由于多模式融合简单而广泛广泛。最近,已经成功地使用卷积神经网络(CNN)等深入学习技术来提取来自多个数据源的特征,缓解早期融合方法的开发。然而,由于模拟时间缺陷的困难,CNN不能充分学习沿视频沿视频分发的提示。可以学习这样的提示的特定深度学习方法是经常性的神经网络(RNN)。在文本处理中成功使用,RNNS被打开以分析可变长度的数据序列,并且可以更好地掌握视频段的低级特征之间的时间关系,希望获得更准确的场景边界检测。本文超出了直接应用RNN,并提出了一种新的多模式方法来临时将视频分段为场景。这种方法在与公共视频数据集上的相关技术相比,仔细构建了一种仔细组合CNN和RNN能力的新架构,从而获得了对任务的更好的功效结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号