首页> 外文会议>International Conference on Signal Processing and Communication Systems >Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

【24h】

Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

机译：通过频谱平滑增强语音识别系统的音高稳健性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higherorder modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.

机译：在本文中，我们提出了一种用于前端语音参数化的新颖方法，该方法比最常用的技术对音高变化更鲁棒。较早的工作表明，幅度谱的不足平滑会导致音高引起的失真。反过来，这导致语音识别系统的性能较差，尤其是对于高音调的儿童说话者而言。为了克服此缺点，首先使用改进的经验模式分解（EMD）版本将短时幅度谱分解为几个分量。接下来，丢弃最低阶分量，并使用其余的高阶模式重构频谱，以充分平滑频谱。然后使用平滑频谱提取梅尔频率倒谱系数（MFCC）。本文提出的信号域分析表明，通过包含拟议的频谱平滑模块，可以大大降低音高变化的不良影响。为了对其进行统计验证，使用来自成人说话者的语音数据开发了自动语音识别系统。为了模拟较大的音高差异，对测试集进行评估，该测试集包含来自儿童说话者的语音数据。与基于深度神经网络的声学建模的基线系统相比，包含拟议的频谱平滑模块可相对提高12％。

著录项

来源
《International Conference on Signal Processing and Communication Systems 》|2018年|242-246|共5页
会议地点
作者
B. Tarun Sai; Ishwar Chandra Yadav; S. Shahnawazuddin; Gayadhar Pradhan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Smoothing methods; Speech recognition; Mel frequency cepstral coefficient; Feature extraction; Data mining;

机译：平滑方法;语音识别;梅尔倒谱系数;特征提取;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing [J] . Yadav Ishwar Chandra, Shahnawazuddin S., Pradhan Gayadhar Digital Signal Processing . 2019 ,第期

机译：通过基于分析模式分解的光谱平滑来解决语音识别系统的噪声和音调灵敏度
2. Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds [J] . Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Computer speech and language . 2013 ,第3期

机译：客厅中的语音识别：基于声音的空间，频谱和时间建模的集成语音增强和识别系统
3. Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments [J] . Barfuss Hendrik, Huemmer Christian, Schwarz Andreas, Computer speech and language . 2017 ,第nova期

机译：基于健壮的基于相干性的频谱增强，可在不利的现实环境中进行语音识别
4. Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing [C] . B. Tarun Sai, Ishwar Chandra Yadav, S. Shahnawazuddin, International Conference on Signal Processing and Communications . 2018

机译：通过光谱平滑增强语音识别系统的音高稳健性
5. Compressive nonlinearity for representing speech spectral magnitude to improve noise robustness of automatic speech recognition . [D] . Wong, Brian. 2011

机译：压缩非线性表示语音频谱幅度提高语音自动识别的鲁棒性。
6. Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：识别消息和使者：仿生频谱分析可增强语音和说话者识别能力
7. Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments [O] . Hendrik Barfuss, Christian Huemmer, Andreas Schwarz, 2017

机译：基于强大的相干性的透视识别在不利现实世界环境中的语音识别

Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

摘要

著录项

相似文献

相关主题

期刊订阅