Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation

机译：通过基于变分自动编码器的数据增强实现无监督域自适应以实现鲁棒的语音识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Domain mismatch between training and testing can lead to significant degradation in performance in many machine learning scenarios. Unfortunately, this is not a rare situation for automatic speech recognition deployments in real-world applications. Research on robust speech recognition can be regarded as trying to overcome this domain mismatch issue. In this paper, we address the unsupervised domain adaptation problem for robust speech recognition, where both source and target domain speech are available, but word transcripts are only available for the source domain speech. We present novel augmentation-based methods that transform speech in a way that does not change the transcripts. Specifically, we first train a variational autoencoder on both source and target domain data (without supervision) to learn a latent representation of speech. We then transform nuisance attributes of speech that are irrelevant to recognition by modifying the latent representations, in order to augment labeled training data with additional data whose distribution is more similar to the target domain. The proposed method is evaluated on the CHiME-4 dataset and reduces the absolute word error rate (WER) by as much as 35% compared to the non-adapted baseline.

机译：在许多机器学习场景中，训练和测试之间的域不匹配会导致性能显着下降。不幸的是，对于现实应用中的自动语音识别部署而言，这并非罕见。关于鲁棒语音识别的研究可以被认为是试图克服该域失配问题。在本文中，我们解决了用于鲁棒语音识别的无监督域自适应问题，其中源域语音和目标域语音均可用，但单词成绩单仅适用于源域语音。我们提出了一种新颖的基于增强的方法，该方法以不改变成绩单的方式转换语音。具体来说，我们首先在源域和目标域数据上训练变体自动编码器（无监督），以学习语音的潜在表示。然后，我们通过修改潜在表示来变换与识别无关的语音干扰属性，以便使用其分布与目标域更相似的其他数据来增强标记的训练数据。所提出的方法在CHiME-4数据集上进行了评估，与不适应的基准相比，其绝对单词错误率（WER）降低了35％。

著录项

来源
《2017 IEEE Automatic Speech Recognition and Understanding Workshop》|2017年|16-23|共8页
会议地点 Okinawa(JP)
作者
Wei-Ning Hsu; Yu Zhang; James Glass;
展开▼
作者单位

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA;

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA;

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech; Speech recognition; Adaptation models; Robustness; Pragmatics; Decoding; Noise measurement;

机译：语音;语音识别;适应模型;健壮性;语用学;解码;噪声测量;;

相似文献

外文文献
中文文献
专利

1. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition [J] . Deng J., Zhang Z., Eyben F., IEEE signal processing letters . 2014,第9期

机译：基于自动编码器的无监督域自适应语音情感识别
2. An unsupervised deep domain adaptation approach for robust speech recognition [J] . Sun Sining, Zhang Binbin, Xie Lei, Neurocomputing . 2017,第sepa27期

机译：一种无监督的深度域自适应方法，可实现可靠的语音识别
3. Robust Speech Recognition Based on Structured Modeling, Irrelevant Variability Normalization and Unsupervised Online Adaptation [J] . Qiang Huo 電子情報通信学会技術研究報告 . 2008,第551期

机译：基于结构化建模，不相关变量归一化和无监督在线自适应的鲁棒语音识别
4. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation [C] . Wei-Ning Hsu, Yu Zhang, James Glass IEEE Workshop on Automatic Speech Recognition and Understanding . 2017

机译：通过基于变化的AutoEncoder的数据增强，无监督域适应强大的语音识别
5. Adaptation and Augmentation: Towards Better Rescoring Strategies for Automatic Speech Recognition and Spoken Term Detection [D] . Ma, Min. 2018

机译：适应和增强：寻求更好的自动语音识别和语音术语检测的评分策略
6. Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data [O] . Ayesha Pervaiz, Fawad Hussain, Huma Israr, 2020

机译：通过训练数据的噪声增强将噪声鲁棒性纳入语音命令识别中
7. Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation [O] . Hsu, Wei-Ning, Zhang, Yu, Glass, James 2017

机译：用于鲁棒语音识别的无监督域自适应基于变分自动编码器的数据增强
8. Cepstral Domain Talker Stress Compensation for Robust Speech Recognition [R] . Chen, Y. 1988

机译：用于鲁棒语音识别的倒谱域语音应力补偿

Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅