首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization
【24h】

Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

机译:面向自动语音识别的智能声学前端:内置扬声器归一化

获取原文
           

摘要

A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization . More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN), despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN), where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech task (SPINE 2), where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.
机译:一种由于说话者差异而实现有效的自动语音识别(ASR)的有效方法是执行声学特征说话者归一化。需要更有效的说话者归一化方法,这些方法需要有限的计算资源才能实现实时性能。尽管它在计算上很昂贵,但最流行的说话者归一化技术是声道长度归一化(VTLN)。在这项研究中,我们提出了一种新颖的在线VTLN算法,称为内置扬声器归一化(BISN),其中归一化是在新提出的PMVDR声学前端中即时执行的。新颖的算法方面是,在使用PMVDR和VTLN的常规前端处理中,需要两个分离的翘曲阶段。而在提出的BISN方法中,仅使用一个与单个扬声器相关的扭曲来同时实现PMVDR感知扭曲和VTLN扭曲。这种改进的集成统一了前端执行的非线性翘曲并同时减少了非线性翘曲。这种改进的集成统一了前端执行的非线性变形并降低了计算要求,从而为实时ASR系统提供了优势。对(i)车内扩展数字识别任务进行评估,其中BISN即时实施可将相对单词错误率(WER)降低24%,以及(ii)各种嘈杂的语音任务(SPINE) 2),相对于基准说话人归一化方法,相对WER改善为9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号