WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN

机译：手表，听一次，并同步：与多模态回归CNN的视听同步

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recovering audio-visual synchronization is an important task in the field of visual speech processing. In this paper, we present a multi-modal regression model that uses a convolutional neural network (CNN) for recovering audio-visual synchronization of single-person speech videos. The proposed model takes audio and visual features of multiple frames as the input and predicts a drifted frame number of the audiovisual pair which we input. We treat this synchronization task as a regression problem. Thus, the model does not need to search with a sliding window which would increase the computational cost. Experimental results show that the proposed method outperforms other baseline methods for recovered accuracy and computational cost.

机译：恢复视听同步是视觉语音处理领域的重要任务。在本文中，我们提出了一种多模态回归模型，它使用卷积神经网络（CNN）来恢复单人语音视频的视听同步。所提出的模型采用多个帧的音频和可视特征作为输入，并预测我们输入的视听对的漂移帧数。我们将此同步任务视为回归问题。因此，该模型不需要用滑动窗口搜索，这将增加计算成本。实验结果表明，该方法优于回收精度和计算成本的其他基线方法。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2018年|2536-3185p|共5页
会议地点
作者
Toshiki Kikuchi; Yuko Ozasa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Audio-visual synchronization; visual speech processing; neural networks;

机译：视听同步;视觉语音处理;神经网络;

相似文献

外文文献
中文文献
专利

1. Audio-visual synchronization in reading while listening to texts: Effects on visual behavior and verbal learning [J] . Emilie Gerbier, Gerard Bailly, Marie Line Bosse Computer speech and language . 2018,第JANa期

机译：阅读文本时视听同步：对视觉行为和言语学习的影响
2. Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video [J] . Li Haoran, Zhu Junnan, Ma Cong, IEEE Transactions on Knowledge and Data Engineering . 2019,第5期

机译：阅读，观看，收听和汇总：异步文本，图像，音频和视频的多模式汇总
3. Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video [J] . Li Haoran, Zhu Junnan, Ma Cong, IEEE Transactions on Knowledge and Data Engineering . 2019,第5期

机译：阅读，观看，倾听和总结：异步文本，图像，音频和视频的多模态摘要
4. WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN [C] . Toshiki Kikuchi, Yuko Ozasa IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：手表，听一次，并同步：与多模态回归CNN的视听同步
5. Optimizing synchronization cost for mobile devices: The Expedient Trickle Sync algorithm. [D] . Barclay, Brad J. 2008

机译：优化移动设备的同步成本：便捷的Tri流同步算法。
6. A manually denoised audio-visual movie watching fMRI dataset for the studyforrest project [O] . Xingyu Liu, Zonglei Zhen, Anmin Yang, 2019

机译：用于studyforrest项目的手动去噪视听电影观看功能磁共振成像数据集
7. Are We in Sync? Synchronization Requirements for Watching Online Video Together [O] . Geerts David, Vaishnavi Ishan, Mekuria Rufael, 2011

机译：我们同步吗？一起观看在线视频的同步要求
8. Time division radio relay synchronizing system using different sync code words for in sync and out of sync conditions Patent [R] . Campbell, D. R., Glomb, W. L. 1970

机译：时分无线电中继同步系统使用不同的同步码字同步和不同步条件专利

WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN

摘要

著录项

相似文献

相关主题

期刊订阅