首页> 外文OA文献 >Acoustic Adaptation to Dynamic Background Conditions with Asynchronous Transformations
【2h】

Acoustic Adaptation to Dynamic Background Conditions with Asynchronous Transformations

机译:异步变换对动态背景条件的声学适应

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper proposes a framework for performing adaptation to complex and non-stationary background conditions in Automatic Speech Recognition (ASR) by means of asynchronous Constrained Maximum Likelihood Linear Regression (aCMLLR) transforms and asynchronous Noise Adaptive Training (aNAT). The proposed method aims to apply the feature transform that best compensates the background for every input frame. The implementation is done with a new Hidden Markov Model (HMM) topology that expands the usual left-to-right HMM into parallel branches adapted to different background conditions and permits transitions among them. Using this, the proposed adaptation does not require ground truth or previous knowledge about the background in each frame as it aims to maximise the overall log-likelihood of the decoded utterance. The proposed aCMLLR transforms can be further improved by retraining models in an aNAT fashion and by using speaker-based MLLR transforms in cascade for an efficient modelling of background effects and speaker. An initial evaluation in a modified version of the WSJCAM0 corpus incorporating 7 different background conditions provides a benchmark in which to evaluate the use of aCMLLR transforms. A relative reduction of 40.5% in Word Error Rate (WER) was achieved by the combined use of aCMLLR and MLLR in cascade. Finally, this selection of techniques was applied in the transcription of multi-genre media broadcasts, where the use of aNAT training, aCMLLR transforms and MLLR transforms provided a relative improvement of 2–3%.
机译:本文提出了一个框架,该框架通过异步约束最大似然线性回归(aCMLLR)变换和异步噪声自适应训练(aNAT)对自动语音识别(ASR)中的复杂和非平稳背景条件进行自适应。所提出的方法旨在应用对每个输入帧最好地补偿背景的特征变换。该实现是通过新的隐马尔可夫模型(HMM)拓扑完成的,该拓扑将通常的从左到右的HMM扩展为适应不同背景条件的并行分支,并允许它们之间进行转换。使用此方法,建议的适应方法不需要地面真理或每帧背景的先前知识,因为它旨在使解码话语的整体对数似然性最大化。通过以aNAT方式重新训练模型,以及通过级联使用基于说话者的MLLR变换来对背景效果和说话者进行有效建模,可以进一步改善提出的aCMLLR变换。结合7种不同背景条件的WSJCAM0语料库的修改版本中的初始评估提供了一个基准,可以在其中评估aCMLLR转换的使用。通过将aCMLLR和MLLR级联使用,可以将单词错误率(WER)降低40.5%。最后,这种技术选择被应用到多类型媒体广播的转录中,其中使用aNAT训练,aCMLLR转换和MLLR转换的相对改进为2-3%。

著录项

  • 作者

    Saz O.; Hain T.;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号