首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition
【24h】

Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition

机译:用于语音识别的深度模型中激活函数参数的贝叶斯无监督批处理和在线说话者自适应

获取原文
获取原文并翻译 | 示例

摘要

We present a Bayesian framework to obtain maximum a posteriori (MAP) estimation of a small set of hidden activation function parameters in context-dependent-deep neural network-hidden markov model (CD-DNN-HMM)-based automatic speech recognition (ASR) systems. When applied to speaker adaptation, we aim at transfer learning from a well-trained deep model for a “general” usage to a “personalized” model geared toward a particular talker by using a collection of speaker-specific data. To make the framework applicable to practical situations, we perform adaptation in an unsupervised manner assuming that the transcriptions of the adaptation utterances are not readily available to the ASR system. We conduct a series of comprehensive batch adaptation experiments on the Switchboard ASR task and show that the proposed approach is effective even with CD-DNN-HMM built with discriminative sequential training. Indeed, MAP speaker adaptation reduces the word error rate (WER) to 20.1% from an initial 21.9% on the full NIST 2000 Hub5 benchmark test set. Moreover, MAP speaker adaptation compares favorably with other techniques evaluated on the same speech tasks. We also demonstrate its complementarity to other approaches by applying MAP adaptation to CD-DNN-HMM trained with speaker adaptive features generated through constrained maximum likelihood linear regression and further reduces the WER to 18.6%. Leveraging upon the intrinsic recursive nature in Bayesian adaptation and mitigating possible system constraints on batch learning, we also proposed an incremental approach to unsupervised online speaker adaptation by simultaneously updating the hyperparameters of the approximate posterior densities and the DNN parameters sequentially. The advantage of such a sequential learning algorithm over a batch method is not necessarily in the final performance, but in computational efficiency and reduced storage needs, without having to wait for all the data to be processed. So far, the experimental results are promising.
机译:我们提出了一个贝叶斯框架,以在基于上下文的深度神经网络-隐藏马尔可夫模型(CD-DNN-HMM)基于自动语音识别(ASR)的情况下,获得一小部分隐藏激活函数参数的最大后验(MAP)估计系统。当应用于说话人适应时,我们的目标是通过使用特定于说话人的数据,将学习从训练有素的深度模型(用于“一般”用法)转移到针对特定说话者的“个性化”模型。为了使该框架适用于实际情况,我们假设自适应话语的转录不容易用于ASR系统,因此我们以无监督的方式执行自适应。我们对Switchboard ASR任务进行了一系列全面的批处理适应性实验,结果表明,即使采用具有区别性顺序训练的CD-DNN-HMM,该方法也是有效的。确实,MAP说话者自适应功能可将整个NIST 2000 Hub5基准测试集的单词错误率(WER)从最初的21.9%降低到20.1%。而且,MAP说话者自适应与在相同语音任务上评估的其他技术相比具有优势。我们还通过将MAP自适应应用于通过受限最大似然线性回归生成的说话人自适应特征训练的CD-DNN-HMM,证明了其与其他方法的互补性,并将WER进一步降低至18.6%。利用贝叶斯自适应的内在递归性质,并减轻批处理学习中可能的系统约束,我们还提出了一种增量方法,用于无监督的在线说话者自适应,方法是同时依次更新近似后验密度的超参数和DNN参数。相对于批处理方法,这种顺序学习算法的优势不一定在于最终性能,而在于计算效率和减少的存储需求,而不必等待所有数据被处理。到目前为止,实验结果是有希望的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号