A Log-likelihood Regularized KL Divergence for Video Prediction With a 3D Convolutional Variational Recurrent Network

机译：具有3D卷积变分频复制网络的视频预测的日志似然正常化KL发散

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The use of latent variable models has shown to be a powerful tool for modeling probability distributions over sequences. In this paper, we introduce a new variational model that extends the recurrent network in two ways for the task of video frame prediction. First, we introduce 3D convolutions inside all modules including the recurrent model for future frame prediction, inputting and outputting a sequence of video frames at each timestep. This enables us to better exploit spatiotemporal information inside the variational recurrent model, allowing us to generate high-quality predictions. Second, we enhance the latent loss of the variational model by introducing a maximum likelihood estimate in addition to the KL divergence that is commonly used in variational models. This simple extension acts as a stronger regularizer in the variational autoencoder loss function and lets us obtain better results and generalizability. Experiments show that our model outperforms existing video prediction methods on several benchmarks while requiring fewer parameters.

机译：使用潜在变量模型已显示是一种强大的工具，用于通过序列建模概率分布。在本文中，我们介绍了一种新的变形模型，以两种方式扩展了经常性网络的视频帧预测的任务。首先，我们在所有模块内引入3D卷积，包括用于未来帧预测的反复模型，在每个时间步骤中输入和输出一系列视频帧。这使我们能够更好地利用变分复制模型内的时空信息，使我们能够产生高质量的预测。其次，除了在变分模型中通常使用的KL发散之外，我们通过引入最大可能性估计来增强变分模型的潜伏损失。这种简单的扩展将作为变形式自动化器丢失功能中的更强的规则器，并让我们获得更好的结果和概括性。实验表明，我们的模型在几个基准测试中优于现有的视频预测方法，同时需要更少的参数。

著录项

来源
《IEEE Winter Applications and Computer Vision Workshops》|2021年|209-217|共9页
会议地点
作者
Haziq Razali; Basura Fernando;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Solid modeling; Three-dimensional displays; Conferences; Stochastic processes; Computer architecture; Predictive models; Tools;

机译：实体建模;三维显示器;会议;随机流程;计算机架构;预测模型;工具;

相似文献

外文文献
中文文献
专利

1. Motion Sickness Prediction in Stereoscopic Videos using 3D Convolutional Neural Networks [J] . Lee Tae Min, Yoon Jong-Chul, Lee In-Kwon IEEE transactions on visualization and computer graphics . 2019,第5期

机译：使用3D卷积神经网络预测立体视频中的晕动病
2. Motion Sickness Prediction in Stereoscopic Videos using 3D Convolutional Neural Networks [J] . Lee Tae Min, Yoon Jong-Chul, Lee In-Kwon IEEE transactions on visualization and computer graphics . 2019,第5期

机译：使用3D卷积神经网络的立体视频中的运动疾病预测
3. Adult content detection in videos with convolutional and recurrent neural networks [J] . Jônatas Wehrmann, Gabriel S. Simões, Rodrigo C. Barros, Neurocomputing . 2018,第jana10期

机译：卷积和递归神经网络视频中的成人内容检测
4. Interaction-Aware Trajectory Prediction based on a 3D Spatio-Temporal Tensor Representation using Convolutional–Recurrent Neural Networks [C] . Martin Krüger, Anne Stockem Novo, Till Nattermann, IEEE Intelligent Vehicles Symposium . 2020

机译：基于使用卷积复制神经网络的3D时空张量表示的交互感知轨迹预测
5. Identifying Sports Players in Broadcast Videos Using Recurrent and Convolutional Neural Networks [D] . Chan, Alvin. 2018

机译：使用反复和卷积神经网络识别广播视频中的体育运动者
6. Automatic Detection of the Pharyngeal Phase in Raw Videos for the Videofluoroscopic Swallowing Study Using Efficient Data Collection and 3D Convolutional Networks [O] . Jong Taek Lee, Eunhee Park, Tae-Du Jung 2019

机译：使用有效的数据收集和3D卷积网络自动检测原始视频中的咽相以便进行视频荧光吞咽研究
7. Recurrent Fully Convolutional Networks Based on Optical Flow for Video Eyes Fixation Prediction [O] . Jiu-chen SHI, Dong ZHANG 2018

机译：基于光学流的录像预测的复发完全卷积网络

A Log-likelihood Regularized KL Divergence for Video Prediction With a 3D Convolutional Variational Recurrent Network

摘要

著录项

相似文献

相关主题

期刊订阅