Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

Umit H. Yapanel; John H.L. Hansen

首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

【24h】

Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

机译：面向自动语音识别的智能声学前端：内置扬声器归一化

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization . More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN), despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN), where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech task (SPINE 2), where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

机译：一种由于说话者差异而实现有效的自动语音识别（ASR）的有效方法是执行声学特征说话者归一化。需要更有效的说话者归一化方法，这些方法需要有限的计算资源才能实现实时性能。尽管它在计算上很昂贵，但最流行的说话者归一化技术是声道长度归一化（VTLN）。在这项研究中，我们提出了一种新颖的在线VTLN算法，称为内置扬声器归一化（BISN），其中归一化是在新提出的PMVDR声学前端中即时执行的。新颖的算法方面是，在使用PMVDR和VTLN的常规前端处理中，需要两个分离的翘曲阶段。而在提出的BISN方法中，仅使用一个与单个扬声器相关的扭曲来同时实现PMVDR感知扭曲和VTLN扭曲。这种改进的集成统一了前端执行的非线性翘曲并同时减少了非线性翘曲。这种改进的集成统一了前端执行的非线性变形并降低了计算要求，从而为实时ASR系统提供了优势。对（i）车内扩展数字识别任务进行评估，其中BISN即时实施可将相对单词错误率（WER）降低24％，以及（ii）各种嘈杂的语音任务（SPINE） 2），相对于基准说话人归一化方法，相对WER改善为9％。

著录项

来源
《EURASIP journal on audio, speech, and music processing》 |2008年第1期|共13页
作者
Umit H. Yapanel; John H.L. Hansen;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Improved automatic speech recognition through speaker normalization [J] . Diego Giuliani, Matteo Gerosa, Fabio Brugnara Computer speech and language . 2006,第1期

机译：通过说话者归一化改进了自动语音识别
2. Acoustic quality normalization for robust automatic speech recognition [J] . Ghulam Muhammad International journal of speech technology . 2007,第4期

机译：声学质量归一化，可实现强大的自动语音识别
3. Fuzzy Temporal Models of Acoustic Processes in Intelligent Systems of Automatic Speech Recognition [J] . L. S. Bershtein, S. M. Kovalev Journal of Computer and Systems Sciences International . 2004,第6期

机译：自动语音识别智能系统中声学过程的模糊时间模型
4. TOWARDS AN INTELLIGENT ACOUSTIC FRONT-END FOR AUTOMATIC SPEECH RECOGNITION: BUILT-IN SPEAKER NORMALIZATION (BISN) [C] . Umit H. Yapanel, John H. L. Hansen IEEE International Conference on Acoustics, Speech, and Signal Processing . 2005

机译：朝着自动语音识别的智能声学前端：内置扬声器归一化（BISN）
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion [O] . Prasanta Kumar Ghosh, Shrikanth Narayanan -1

机译：使用从独立于受试者的声学到发音反转的发音特征进行自动语音识别
7. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization [O] . 2008

机译：面向自动语音识别的智能声学前端：内置扬声器归一化
8. Speaker Recognition from Coded Speech and the Effects of Score Normalization. [R] . Dunn, R. B., Quatieri, T. F., Reynolds, D. A., 2016

机译：编码语音中的说话人识别及分数归一化的效果。

Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

摘要

著录项

相似文献

相关主题

期刊订阅