首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Visually Guided Self Supervised Learning of Speech Representations
【24h】

Visually Guided Self Supervised Learning of Speech Representations

机译:视觉指导下的语音表达自我监督学习

获取原文

摘要

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very limited work that studies the interaction between the two modalities for learning self supervised representations. We propose a framework for learning audio representations guided by the visual modality in the context of audiovisual speech. We employ a generative audio-to-video training scheme in which we animate a still image corresponding to a given audio clip and optimize the generated video to be as close as possible to the real video of the speech segment. Through this process, the audio encoder network learns useful speech representations that we evaluate on emotion recognition and speech recognition. We achieve state of the art results for emotion recognition and competitive results for speech recognition. This demonstrates the potential of visual supervision for learning audio representations as a novel way for self-supervised learning which has not been explored in the past. The proposed unsupervised audio features can leverage a virtually unlimited amount of training data of unlabelled audiovisual speech and have a large number of potentially promising applications.
机译:自我监督的表示学习最近吸引了许多关于音频和视觉模式的研究兴趣。但是,大多数作品通常只关注特定的模态或特征,而针对学习自我监督表示的两种模态之间的相互作用的研究非常有限。我们提出了一个框架,用于在视听语音的上下文中学习由视觉模态引导的音频表示。我们采用了一种生成音频到视频的训练方案,在该方案中,我们对与给定音频剪辑相对应的静止图像进行动画处理,并优化生成的视频,使其尽可能接近语音段的真实视频。通过此过程,音频编码器网络会学习有用的语音表示形式,我们会对情感识别和语音识别进行评估。我们在情感识别方面取得了最先进的结果,在语音识别方面取得了竞争性的结果。这证明了视觉监督在学习音频表示中的潜力,这是过去从未探索过的一种新的自我监督学习方式。提出的无监督音频功能可以利用几乎无限量的未标记视听语音的训练数据,并具有大量潜在的应用前景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号