A Front-End Technique for Automatic Noisy Speech Recognition

机译：一种自动嘈杂语音识别的前端技术

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The sounds in a real environment not often take place in isolation because sounds are building complex and usually happen concurrently. Auditory masking relates to the perceptual interaction between sound components. This paper proposes modeling the effect of simultaneous masking into the Mel frequency cepstral coefficient (MFCC) and effectively improve the performance of the resulting system. Moreover, the Gammatone frequency integration is presented to warp the energy spectrum which can provide gradually decaying the weights and compensate for the loss of spectral correlation. Experiments are carried out on the Aurora-2 database, and frame-level cross entropy-based deep neural network (DNN-HMM) training is used to build an acoustic model. While given models trained on multi-condition speech data, the accuracy of our proposed feature extraction method achieves up to 98.14% in case of 10dB, 94.40% in 5dB, 81.67% in 0dB and 51.5% in −5dB, respectively.

机译：真实环境中的声音通常在隔离中经常发生，因为声音正在构建复杂并且通常同时发生。听觉掩模涉及声音组件之间的感知相互作用。本文提出了对MEL频率谱系数（MFCC）同时掩蔽的影响，有效地提高了所得系统的性能。此外，呈现γ频率集成以横跨能量谱进行扫描，该能谱可以提供逐渐衰减的权重，并补偿光谱相关的损失。实验在Aurora-2数据库上进行，并且使用帧级跨熵的深神经网络（DNN-HMM）训练来构建声学模型。虽然给定的模型在多条件语音数据上培训时，我们所提出的特征提取方法的准确性在10dB的情况下达到98.14％，分别在5dB的10dB，94.40％，81.67％，分别为-5dB的51.5％。

著录项

来源
《Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques》|2020年|49-54|共6页
会议地点
作者
Hay Mar Soe Naing; Risanuri Hidayat; Rudy Hartanto; Yoshikazu Miyanaga;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Masking threshold; Hidden Markov models; Mel frequency cepstral coefficient; Feature extraction; Noise measurement; Shape; Ear;

机译：掩蔽阈值;隐藏的马尔可夫模型;麦倍频抗肌肌系数;特征提取;噪声测量;形状;耳朵;

相似文献

外文文献
中文文献
专利

1. Automatic Speech Recognition System Based on Hybrid Feature Extraction Techniques Using TEO-PWP for in Real Noisy Environment [J] . Wafa Helali, Zied Hajaiej, Adnen Cherif International journal of computer science and network security . 2019,第10期

机译：基于混合特征提取技术的自动语音识别系统使用两PWP在实际嘈杂环境中的情况下
2. A STATISTICAL ANALYSIS ON THE IMPACT OF SPEECH ENHANCEMENT TECHNIQUES ON THE FEATURE VECTORS OF NOISY SPEECH SIGNALS FOR SPEECH RECOGNITION [J] . SWAPNANIL GOGOI, UTPAL BHATTACHARJEE Journal of computer science engineering and information technology research . 2016,第3期

机译：语音增强技术对语音识别中嘈杂语音信号特征向量影响的统计分析
3. A STATISTICAL ANALYSIS ON THE IMPACT OF SPEECH ENHANCEMENT TECHNIQUES ON THE FEATURE VECTORS OF NOISY SPEECH SIGNALS FOR SPEECH RECOGNITION [J] . SWAPNANIL GOGOI, UTPAL BHATTACHARJEE Journal of computer science engineering and information technology research . 2016,第3期

机译：语音增强技术对语音识别中嘈杂语音信号特征向量影响的统计分析
4. Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition [C] . Meet H. Soni, Sorial Joshi, Ashish Panda International conference on text, speech, and dialogue . 2019

机译：比较前端增强技术和多条件训练的鲁棒性自动语音识别
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference [O] . Byeongwook Lee, Kwang-Hyun Cho -1

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割
7. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [O] . Lu, Xugang, Unoki, Masashi, Akagi, Masato 2008

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器

A Front-End Technique for Automatic Noisy Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅