首页> 外文期刊>Cognitive Computation and Systems >Stacked residual blocks based encoder–decoder framework for human motion prediction
【24h】

Stacked residual blocks based encoder–decoder framework for human motion prediction

机译:基于堆积的残余块的基于编码器 - 解码器用于人类运动预测的框架

获取原文
获取原文并翻译 | 示例
           

摘要

Human motion prediction is an important and challenging task in computer vision with various applications. Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have been proposed to address this challenging task. However, RNNs exhibit their limitations on long-term temporal modelling and spatial modelling of motion signals. CNNs show their inflexible spatial and temporal modelling capability that mainly depends on a large convolutional kernel and the stride of convolutional operation. Moreover, those methods predict multiple future poses recursively, which easily suffer from noise accumulation. The authors present a new encoder–decoder framework based on the residual convolutional block with a small filter to predict future human poses, which can flexibly capture the hierarchical spatial and temporal representation of the human motion signals from the motion capture sensor. Specifically, the encoder is stacked by multiple residual convolutional blocks to hierarchically encode the spatio-temporal features of previous poses. The decoder is built with two fully connected layers to automatically reconstruct the spatial and temporal information of future poses in a non-recursive manner, which can avoid noise accumulation that differs from prior works. Experimental results show that the proposed method outperforms baselines on the Human3.6M dataset, which shows the effectiveness of the proposed method. The code is available at https://github.com/lily2lab/residual_prediction_network .
机译:人体运动预测是各种应用的计算机视觉中的重要且挑战性的任务。已经提出了经常性的神经网络(RNNS)和卷积神经网络(CNNS)来解决这一具有挑战性的任务。然而,RNNS对运动信号的长期时间建模和空间建模表现出它们的局限性。 CNNS显示了它们的不灵活的空间和时间建模能力,主要取决于大型卷积核和卷积运行的脚步。此外,这些方法预测了多个未来递归的姿势,这容易遭受噪声积累。作者呈现了一种基于具有小过滤器的残余卷积块的新编码器解码器框架,以预测未来人类的姿势,这可以灵活地捕获来自运动捕获传感器的人类运动信号的分层空间和时间表示。具体地,编码器由多个残余卷积块堆叠,以分级地编码先前姿势的时空特征。解码器用两个完全连接的层构建,以以非递归方式自动重建将来姿势的空间和时间信息,这可以避免与先前作品不同的噪声累积。实验结果表明,所提出的方法优于人3.6M数据集的基线,显示了所提出的方法的有效性。该代码在 https://github.com/lily2lab/residual_prediction_network

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号