首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Cogans For Unsupervised Visual Speech Adaptation To New Speakers
【24h】

Cogans For Unsupervised Visual Speech Adaptation To New Speakers

机译:Cogans对新演讲者进行无监督的视觉语音适应

获取原文

摘要

Audio-Visual Speech Recognition (AVSR) faces the difficult task of exploiting acoustic and visual cues simultaneously. Augmenting speech with the visual channel creates its own challenges, e.g. every person has unique mouth movements, making the generalization of visual models very difficult. This factor motivates our focus on the generalization of speaker-independent (SI) AVSR systems especially in noisy environments by exploiting the visual domain. Specifically, we are the first to explore the visual adaptation of an SI-AVSR system to an unknown and unlabelled speaker. We adapt an AVSR system trained in a source domain to decode samples in a target domain without the need for labels in the target domain. For the domain adaptation of the unknown speaker, we use Coupled Generative Adversarial Networks to automatically learn a joint distribution of multi-domain images. We evaluate our character-based AVSR system on the TCD-TIMIT dataset and obtain up to a 10% average improvement with respect to its AVSR system equivalent.
机译:视听语音识别(AVSR)面临着同时利用声音和视觉提示的艰巨任务。用视觉通道增强语音会产生自己的挑战,例如每个人都有独特的嘴巴动作,这使得视觉模型的泛化非常困难。这个因素促使我们专注于扬声器独立(SI)AVSR系统的通用化,尤其是在嘈杂的环境中,通过利用视觉域。具体来说,我们是第一个探索SI-AVSR系统在视觉上适应未知和无标签说话者的公司。我们调整了在源域中训练的AVSR系统,以解码目标域中的样本,而无需在目标域中添加标签。对于未知说话人的领域适应,我们使用耦合生成对抗网络自动学习多领域图像的联合分布。我们在TCD-TIMIT数据集上评估了基于字符的AVSR系统,相对于其AVSR系统,平均改进了10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号