Temporal video scene segmentation using deep-learning

Trojahn Tiago Henrique; Goularte Rudinei

首页> 外文期刊>Multimedia Tools and Applications >Temporal video scene segmentation using deep-learning

【24h】

Temporal video scene segmentation using deep-learning

机译：使用深度学习的时间视频场景分割

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The automatic temporal video scene segmentation (also known as video story segmentation) is still an open problem without definite solutions in most cases. Among the available techniques, the ones which shows better results are multimodal using features extracted from multiple modalities. Multimodal fusion may be performed to fuse each modality as a single representation (early fusion) or by each modality segmentation (late fusion), the latter been widely due to multimodal fusion simplicity. Recently, deep learning techniques such as convolutional neural networks (CNN) has been successfully employed to extract features from multiple data sources, easing the development of early fusion methods. However, CNNs cannot adequately learn cues which are temporally distributed along the video due to difficulties to model temporal features data dependencies. A particular deep learning approach which can learn such cues is the recurrent neural network (RNN). Successfully employed on text processing, RNNs are fitted to analyze sequences of data of variable length and may better grasp the temporal relationship among low-level features of video segments, hopefully obtaining more accurate scene boundary detection. This paper goes beyond direct applying RNNs and proposes a new multimodal approach to temporally segment a video into scenes. This approach builds a new architecture carefully combining CNN and RNN capabilities, obtaining better efficacy results on the task when compared with related techniques on a public video dataset.

机译：在大多数情况下，自动时间视频场景分割（也称为视频故事分段）仍然是一个没有明确解决方案的打开问题。在可用技术中，显示出更好的结果的方法是使用从多种方式中提取的特征的多模式。可以执行多模式融合以使每个模态融合为单个表示（早期融合）或通过每个模态分割（晚期融合），后者由于多模式融合简单而广泛广泛。最近，已经成功地使用卷积神经网络（CNN）等深入学习技术来提取来自多个数据源的特征，缓解早期融合方法的开发。然而，由于模拟时间缺陷的困难，CNN不能充分学习沿视频沿视频分发的提示。可以学习这样的提示的特定深度学习方法是经常性的神经网络（RNN）。在文本处理中成功使用，RNNS被打开以分析可变长度的数据序列，并且可以更好地掌握视频段的低级特征之间的时间关系，希望获得更准确的场景边界检测。本文超出了直接应用RNN，并提出了一种新的多模式方法来临时将视频分段为场景。这种方法在与公共视频数据集上的相关技术相比，仔细构建了一种仔细组合CNN和RNN能力的新架构，从而获得了对任务的更好的功效结果。

著录项

来源
《Multimedia Tools and Applications》 |2021年第12期|17487-17513|共27页
作者
Trojahn Tiago Henrique; Goularte Rudinei;
展开▼
作者单位

Fed Inst Sao Paulo Sao Carlos SP Brazil;

Univ Sao Paulo Inst Ciencias Matemat & Comp Sao Carlos SP Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multimedia; Video; Temporal scene segmentation; Scene boundary detection; Deep learning;

机译：多媒体;视频;时间场景分割;场景边界检测;深入学习;

相似文献

外文文献
中文文献
专利

1. Correlation based feature fusion for the temporal video scene segmentation task [J] . Kishi Rodrigo Mitsuo, Trojahn Tiago Henrique, Goularte Rudinei Multimedia Tools and Applications . 2019,第11期

机译：用于时间视频场景分割任务的基于相关特征融合
2. Correlation based feature fusion for the temporal video scene segmentation task [J] . Kishi Rodrigo Mitsuo, Trojahn Tiago Henrique, Goularte Rudinei Multimedia Tools and Applications . 2019,第11期

机译：基于相关的时间视频场景分段任务的特征融合
3. Spatio-Temporal Video Segmentation of Static Scenes and Its Applications [J] . Jiang H., Zhang G., Wang H., Multimedia, IEEE Transactions on . 2015,第1期

机译：静态场景的时空视频分割及其应用
4. EntScene: Nonparametric Bayesian Temporal Segmentation of Videos Aimed at Entity-Driven Scene Detection [C] . Adway Mitra, Chiranjib Bhattacharyya, Soma Biswas International Joint Conference on Artificial Intelligence . 2015

机译：entscene：瞄准实体驱动场景检测的视频的非参数贝叶斯时间分割
5. Hierarchical Segmentation of Videos into Shots and Scenes using Visual Content. [D] . Thompson, Andrew. 2010

机译：使用视觉内容将视频分层划分为镜头和场景。
6. Spatio-Temporal Attention Model for Foreground Detection in Cross-Scene Surveillance Videos [O] . Dong Liang, Jiaxing Pan, Han Sun, 2019

机译：跨场景监控视频中前景检测的时空注意模型
7. Temporally Coherent 3D Point Cloud Video Segmentation in Generic Scenes [O] . Xiao Lin, Josep R. Casas, Montse Pardas 2018

机译：通用场景中的时间相干的3D点云视频分段
8. Combining Segmentation and Tracking for the Classification of Moving Objects in Video Scenes [R] . Donohoe, G. W. 1988

机译：结合分割和跟踪用于视频场景中运动物体的分类

Temporal video scene segmentation using deep-learning

摘要

著录项

相似文献

相关主题

期刊订阅