Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition

机译：幂归位倒谱系数（PNCC），用于强大的语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new feature extraction algorithm called Power Normalized Cepstral Coefficients (PNCC) that is based on auditory processing. Major new features of PNCC processing include the use of a power-law nonlinearity that replaces the traditional log nonlinearity used in MFCC coefficients, a noise-suppression algorithm based on asymmetric filtering that suppress background excitation, and a module that accomplishes temporal masking. We also propose the use of medium-time power analysis, in which environmental parameters are estimated over a longer duration than is commonly used for speech, as well as frequency smoothing. Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing, and without degrading the recognition accuracy that is observed while training and testing using clean speech. PNCC processing also provides better recognition accuracy in noisy environments than techniques such as Vector Taylor Series (VTS) and the ETSI Advanced Front End (AFE) while requiring much less computation. We describe an implementation of PNCC using “on-line processing” that does not require future knowledge of the input.

机译：本文提出了一种新的基于听觉处理的特征提取算法，称为功率归一化倒谱系数（PNCC）。 PNCC处理的主要新功能包括：使用幂律非线性来代替MFCC系数中使用的传统对数非线性；基于非对称滤波的噪声抑制算法可抑制背景激励；以及可实现时间掩蔽的模块。我们还建议使用时域功率分析，其中在比语音和频率平滑常用时间更长的持续时间内估计环境参数。实验结果表明，与MFCC和PLP处理相比，在存在各种类型的加性噪声和混响环境下，PNCC处理与语音的MFCC处理相比，在识别准确度方面有了实质性的提高，而计算成本仅比常规MFCC处理略高，并且不会降低识别度使用干净的语音进行训练和测试时观察到的准确性。 PNCC处理还可以在嘈杂的环境中提供比Vector Taylor Series（VTS）和ETSI Advanced Front End（AFE）等技术更高的识别精度，同时所需的计算量更少。我们描述了使用“在线处理”的PNCC的实现，该实现不需要将来的输入知识。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP》|2012年|p.4101- 4104|共4页
会议地点 Kyoto(JP)
作者
Kim, Chanwoo;
展开▼
作者单位

Language Technologies Institute Carnegie Mellon University Pittsburgh PA 15213 USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients [J] . Mohamed Tamazin, Ahmed Gouda, Mohamed Khedr Applied Sciences . 2019,第10期

机译：基于增强功率归一化谱系齐系数的增强的自动语音识别系统
2. Robust optimal sub-band wavelet cepstral coefficient method for speech recognition [J] . John Sahaya Rani Alex, Nithya Venkatesan International Journal of Computer Aided Engineering and Technology . 2019,第2期

机译：语音识别的鲁棒最优子带小波倒谱系数方法
3. Robust Speech Recognition Using Perceptual Wavelet Denoising and Mel-frequency Product Spectrum Cepstral Coefficient Features [J] . M.C.A. Korba, D. Messadeg, R. Djemili, Informatica: An International Journal of Computing and Informatics . 2008,第3期

机译：使用感知小波降噪和梅尔频率乘积谱倒谱系数特征的稳健语音识别
4. Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition [C] . Kim Chanwoo IEEE International Conference on Acoustics, Speech and Signal Processing . 2011

机译：适用于强大的语音识别的功率归一化抗谱系数（PNCC）
5. Estimation of cepstral coefficients for robust speech recognition. [D] . Indrebo, Kevin M. 2008

机译：倒频谱系数的估计，用于鲁棒的语音识别。
6. The application of fractional Mel cepstral coefficient in deceptive speech detection [O] . Xinyu Pan, Heming Zhao, Yan Zhou -1

机译：分数梅尔倒谱系数在欺骗性语音检测中的应用
7. Power-normalized cepstral coefficients (pncc) for robust speech recognition [O] . Chanwoo Kim, Richard M. Stern 2013

机译：用于鲁棒语音识别的功率归一化倒谱系数（pncc）

Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅