Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks

机译：使用非并行语料库和周期一致的对抗网络进行跨域语音识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic speech recognition (ASR) systems often does not perform well when it is used in a different acoustic domain from the training time, such as utterances spoken in noisy environments or in different speaking styles. We propose a novel approach to cross-domain speech recognition based on acoustic feature mappings provided by a deep neural network, which is trained using nonparallel speech corpora from two different domains and using no phone labels. For training a target domain acoustic model, we generate “fake” target speech features from the labeleld source domain features using a mapping Gf. We can also generate “fake” source features for testing from the target features using the backward mapping G_bwhich has been learned simultaneously with G f. The mappings G f and G_bare trained as adversarial networks using a conventional adversarial loss and a cycle-consistency loss criterion that encourages the backward mapping to bring the translated feature back to the original as much as possible such that G_b(Gf (x)) ≈ x. In a highly challenging task of model adaptation only using domain speech features, our method achieved up to 16 % relative improvements in WER in the evaluation using the CHiME3 real test data. The backward mapping was also confirmed to be effective with a speaking style adaptation task.

机译：当自动语音识别（ASR）系统在与培训时间不同的声学域中使用时（例如在嘈杂的环境中或以不同的讲话方式说出的话语），其性能通常不佳。我们提出了一种基于深度神经网络提供的声学特征映射的跨域语音识别的新方法，该方法使用来自两个不同域的非平行语音语料库并且不使用电话标签进行训练。为了训练目标域声学模型，我们使用映射Gf从标签域源域特征生成“伪”目标语音特征。我们还可以使用向后映射G \ n _{b \ n与G f同时学习。映射G f和G \ n _{b \ nare被训练为使用常规对抗损失和周期一致性损失准则的对抗网络，该准则鼓励后向映射将已翻译特征尽可能地带回到原始位置，从而使G \ n _{b < / sub> \ n（Gf（x））≈x。在仅使用域语音功能进行模型自适应的艰巨任务中，我们的方法在使用CHiME3真实测试数据进行评估时，WER相对提高了16％。向后映射也被证实对于说话风格的适应任务是有效的。}}}

著录项

来源
《2017 IEEE Automatic Speech Recognition and Understanding Workshop》|2017年|134-140|共7页
会议地点 Okinawa(JP)
作者
Masato Mimura; Shinsuke Sakai; Tatsuya Kawahara;
展开▼
作者单位

Kyoto University, School of Informatics, Sakyo-ku, Kyoto 606-8501, Japan;

Kyoto University, School of Informatics, Sakyo-ku, Kyoto 606-8501, Japan;

Kyoto University, School of Informatics, Sakyo-ku, Kyoto 606-8501, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Acoustics; Noise measurement; Gallium nitride; Speech; Training; Speech recognition; Adaptation models;

机译：声学;噪声测量;氮化镓;语音;训练;语音识别;适应模型;;

相似文献

外文文献
中文文献
专利

1. A Pairwise Attentive Adversarial Spatiotemporal Network for Cross-Domain Few-Shot Action Recognition-R2 [J] . Zan Gao, Leming Guo, Weili Guan, IEEE Transactions on Image Processing . 2021,第1期

机译：用于跨域的成对临床对抗性时空网络，用于域几次射击动作识别-R2
2. Cross-domain speaker recognition using domain adversarial siamese network with a domain discriminator [J] . Chen Zhigao, Miao Xiaoxiao, Xiao Runqiu, Electronics Letters . 2020,第14期

机译：跨域扬声器识别使用域对抗暹罗网络具有域鉴别器
3. Data augmentation approaches using cycle-consistent adversarial networks for improving COVID-19 screening in portable chest X-ray images [J] . Iglesias Moris Daniel, de Moura Ramos Jose Joaquim, Novo Bujan Jorge, Expert systems with applications . 2021,第Deca期

机译：使用循环一致的对冲网络来改善便携式胸部X射线图像中的Covid-19筛选的数据增强方法
4. Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks [C] . Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara IEEE Workshop on Automatic Speech Recognition and Understanding . 2017

机译：跨域语音识别使用具有循环一致的对抗网络的非平行语料库
5. E_ective Use of Cross-Domain Parsing in Automatic Speech Recognition and Error Detection. [D] . Marin, Marius Alexandru. 2015

机译：跨域解析在自动语音识别和错误检测中的有效使用。
6. Finger-Vein Recognition Using Heterogeneous Databases by Domain Adaption Based on a Cycle-Consistent Adversarial Network [O] . Kyoung Jun Noh, Jiho Choi, Jin Seong Hong, 2021

机译：根据循环一致的对冲网络使用域适应使用异构数据库的手指静脉识别
7. Data Augmentation via Mixed Class Interpolation using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery [O] . Hiroshi Sasaki, Chris G. Willcocks, Toby P. Breckon 2021

机译：通过应用于跨域图像的周期一致的生成对冲网络，通过混合类插值进行数据增强

Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks

摘要

著录项

相似文献

相关主题

期刊订阅