Building Real-Time Speech Recognition Without CMVN

机译：在不使用CMVN的情况下构建实时语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Estimating cepstral mean and variance normalization (CMVN) in run-on and real-time settings poses several challenges. Using a moving average for variance and mean estimation requires a comparatively long history of data from a speaker which is not appropriate for short utterances or conversations. Using a pre-estimated global CMVN for speakers instead reduces the recognition performance due to potential mismatch between training and testing data. This paper investigates how to build a real-time run-on speech recognition system using acoustic features without applying CMVN. We propose a feature extraction architecture which can transform unnormalized log mel features to normalized bottleneck features without using historical data. We empirically show that mean and variance normalization is not critical for training neural networks on speech data. Using the proposed feature extraction, we achieved 4.1% word error rate reduction compared to global CMVN on the Skype conversations test set. We also reveal many cases when features without zero-mean can be learnt well by neural networks which stands in contrast to prior work.

机译：在运行和实时设置中估计倒谱均值和方差归一化（CMVN）带来了一些挑战。将移动平均值用于方差和均值估计需要来自说话者的较长的数据历史记录，这不适用于简短的讲话或对话。相反，由于培训和测试数据之间可能存在不匹配，因此对扬声器使用预先估计的全局CMVN会降低识别性能。本文研究了如何在不应用CMVN的情况下使用声学特征构建实时运行的语音识别系统。我们提出了一种特征提取体系结构，可以将未归一化的log mel特征转换为归一化的瓶颈特征，而无需使用历史数据。我们根据经验表明，均值和方差归一化对于在语音数据上训练神经网络并不关键。使用提议的特征提取，与Skype对话测试集上的全局CMVN相比，我们实现了4.1％的字错误率降低。我们还揭示了许多情况，其中神经网络可以很好地学习零均值特征，这与先前的工作形成了鲜明的对比。

著录项

来源
《International Conference on speech and computer》|2018年|451-460|共10页
会议地点
作者
Thai Son Nguyen; Matthias Sperber; Sebastian Stueker; Alex Waibel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Real-time speech recognition; Feature normalization Neural network;

机译：实时语音识别;特征归一化神经网络;

相似文献

外文文献
中文文献
专利

1. Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation [J] . Sashi NOVITASARI, Sakriani SAKTI, Satoshi NAKAMURA IEICE transactions on information and systems . 2021,第12期

机译：用于实时机器语音翻译的神经增量语音识别
2. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition [J] . Jeffrey Schuster, Kshitij Gupta, Raymond Hoare, EURASIP journal on embedded systems . 2006,第1期

机译：语音芯片：基于实时隐马尔可夫模型的语音识别的FPGA架构
3. Real-Time Implementation of Isolated-Word Speech Recognition System on Raspberry Pi 3 Using WAT-MFCC [J] . Mohamed Walid, Souha Bousselmi, Karim Dabbabi, International journal of computer science and network security . 2019,第3期

机译：使用WAT-MFCC在Raspberry Pi 3上实时实现孤立词语音识别系统
4. Building Real-Time Speech Recognition Without CMVN [C] . Thai Son Nguyen, Matthias Sperber, Sebastian Stueker, International Conference on Speech and Computer . 2018

机译：建立没有CMVN的实时语音识别
5. Real-time speaker -independent large vocabulary continuous speech recognition. [D] . Li, Xiaolong. 2005

机译：实时独立于说话者的大词汇量连续语音识别。
6. A Comparison of Low-Complexity Real-Time Feature Extraction for Neuromorphic Speech Recognition [O] . Jyotibdha Acharya, Aakash Patil, Xiaoya Li, 2018

机译：用于神经形态语音识别的低复杂度实时特征提取的比较
7. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition [O] . 2006

机译：语音芯片：基于实时隐马尔可夫模型的语音识别的FPGA体系结构

Building Real-Time Speech Recognition Without CMVN

摘要

著录项

相似文献

相关主题

期刊订阅