首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Cogans For Unsupervised Visual Speech Adaptation To New Speakers

【24h】

Cogans For Unsupervised Visual Speech Adaptation To New Speakers

机译：Cogans对新演讲者进行无监督的视觉语音适应

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Audio-Visual Speech Recognition (AVSR) faces the difficult task of exploiting acoustic and visual cues simultaneously. Augmenting speech with the visual channel creates its own challenges, e.g. every person has unique mouth movements, making the generalization of visual models very difficult. This factor motivates our focus on the generalization of speaker-independent (SI) AVSR systems especially in noisy environments by exploiting the visual domain. Specifically, we are the first to explore the visual adaptation of an SI-AVSR system to an unknown and unlabelled speaker. We adapt an AVSR system trained in a source domain to decode samples in a target domain without the need for labels in the target domain. For the domain adaptation of the unknown speaker, we use Coupled Generative Adversarial Networks to automatically learn a joint distribution of multi-domain images. We evaluate our character-based AVSR system on the TCD-TIMIT dataset and obtain up to a 10% average improvement with respect to its AVSR system equivalent.

机译：视听语音识别（AVSR）面临着同时利用声音和视觉提示的艰巨任务。用视觉通道增强语音会产生自己的挑战，例如每个人都有独特的嘴巴动作，这使得视觉模型的泛化非常困难。这个因素促使我们专注于扬声器独立（SI）AVSR系统的通用化，尤其是在嘈杂的环境中，通过利用视觉域。具体来说，我们是第一个探索SI-AVSR系统在视觉上适应未知和无标签说话者的公司。我们调整了在源域中训练的AVSR系统，以解码目标域中的样本，而无需在目标域中添加标签。对于未知说话人的领域适应，我们使用耦合生成对抗网络自动学习多领域图像的联合分布。我们在TCD-TIMIT数据集上评估了基于字符的AVSR系统，相对于其AVSR系统，平均改进了10％。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing 》|2020年|6294-6298|共5页
会议地点
作者
Adriana Fernandez-Lopez; Ali Karaali; Naomi Harte; Federico M. Sukno;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
audio-visual speech recognition; unsupervised domain adaptation; speaker adaptation;

机译：视听语音识别;无监督域自适应;扬声器自适应;

相似文献

外文文献
中文文献
专利

1. Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis [J] . John Dines, Hui Liang, Lakshmi Saheer, Computer speech and language . 2013 ,第2期

机译：个性化语音到语音翻译：基于HMM的语音合成的无监督跨语言说话者自适应
2. Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition [J] . Tetsuo KOSAKA, Yuui TAKEDA, Takashi ITO, IEICE transactions on information and systems . 2010 ,第9期

机译：使用演讲者级模型的演讲者语音识别的无监督演讲者自适应
3. Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition [J] . Tetsuo KOSAKA, Yuui TAKEDA, Takashi ITO, IEICE Transactions on Information and Systems . 2010 ,第9期

机译：演讲者语音识别的无监督演讲者自适应模型
4. Cogans For Unsupervised Visual Speech Adaptation To New Speakers [C] . Adriana Fernandez-Lopez, Ali Karaali, Naomi Harte, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：对无监督的视觉语音适应新扬声器的核心人士
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. INTERACTIONS BETWEEN UNSUPERVISED LEARNING AND THE DEGREE OF SPECTRAL MISMATCH ON SHORT-TERM PERCEPTUAL ADAPTATION TO SPECTRALLY-SHIFTED SPEECH [O] . Tianhao Li, John J. Galvin III, Qian-Jie Fu -1

机译：无监督学习和光谱失配短期知觉适应对频谱移语音程度之间相互作用
7. UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS [O] . 2014

机译：基于Hmm语音合成的未经监督的跨语音扬声器适应

Cogans For Unsupervised Visual Speech Adaptation To New Speakers

摘要

著录项

相似文献

相关主题

期刊订阅