首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Visually Guided Self Supervised Learning of Speech Representations
【24h】

Visually Guided Self Supervised Learning of Speech Representations

机译:视觉引导自我监督讲话表示的学习

获取原文

摘要

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very limited work that studies the interaction between the two modalities for learning self supervised representations. We propose a framework for learning audio representations guided by the visual modality in the context of audiovisual speech. We employ a generative audio-to-video training scheme in which we animate a still image corresponding to a given audio clip and optimize the generated video to be as close as possible to the real video of the speech segment. Through this process, the audio encoder network learns useful speech representations that we evaluate on emotion recognition and speech recognition. We achieve state of the art results for emotion recognition and competitive results for speech recognition. This demonstrates the potential of visual supervision for learning audio representations as a novel way for self-supervised learning which has not been explored in the past. The proposed unsupervised audio features can leverage a virtually unlimited amount of training data of unlabelled audiovisual speech and have a large number of potentially promising applications.
机译:自我监督的代表学习最近吸引了对音频和视觉方式的许多研究兴趣。然而,大多数作品通常专注于单独的特定模态或特征,并且有很有限的工作,研究了学习自我监督表示的两个方式之间的相互作用。我们提出了一个学习音频表示的框架,以在视听语音的上下文中由视觉模型引导。我们采用了一种生成的音频到视频训练方案,其中我们将与给定音频剪辑对应的静止图像设置动画,并优化所生成的视频以尽可能接近语音段的真实视频。通过这个过程,音频编码器网络了解我们对情感识别和语音识别的有用语音表示。我们实现了最先进的态度,以便情感认可和竞争结果进行语音识别。这证明了对学习音频表示的视觉监督作为自我监督学习的新方法,这在过去尚未探讨。提出的无监督的音频功能可以利用几乎无限量的未标记的视听语言培训数据,并具有大量潜在的潜在有前途的应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号