首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN
【24h】

WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN

机译:手表,听一次,并同步:与多模态回归CNN的视听同步

获取原文

摘要

Recovering audio-visual synchronization is an important task in the field of visual speech processing. In this paper, we present a multi-modal regression model that uses a convolutional neural network (CNN) for recovering audio-visual synchronization of single-person speech videos. The proposed model takes audio and visual features of multiple frames as the input and predicts a drifted frame number of the audiovisual pair which we input. We treat this synchronization task as a regression problem. Thus, the model does not need to search with a sliding window which would increase the computational cost. Experimental results show that the proposed method outperforms other baseline methods for recovered accuracy and computational cost.
机译:恢复视听同步是视觉语音处理领域的重要任务。在本文中,我们提出了一种多模态回归模型,它使用卷积神经网络(CNN)来恢复单人语音视频的视听同步。所提出的模型采用多个帧的音频和可视特征作为输入,并预测我们输入的视听对的漂移帧数。我们将此同步任务视为回归问题。因此,该模型不需要用滑动窗口搜索,这将增加计算成本。实验结果表明,该方法优于回收精度和计算成本的其他基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号