Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

Sara Ahmadi; Seyed Mohammad Ahadi; Bert Cranen; Lou Boves

首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

【24h】

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

机译：调制频谱的稀疏编码，用于鲁棒的自动语音识别

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities. The modulation spectrum analyser uses 15 gammatone filters. The Hilbert envelope of the output of these filters is then processed by nine modulation frequency filters, with bandwidths up to 16 Hz. Experiments using the AURORA-2 task show that the novel approach is promising. We found that the representation of medium-term dynamics in the modulation spectrum analyser must be improved. We also found that we should move towards sparse classification, by modifying the cost function in sparse coding such that the class(es) represented by the exemplars weigh in, in addition to the accuracy with which unknown observations are reconstructed. This creates two challenges: (1) developing a method for dictionary learning that takes the class occupancy of exemplars into account and (2) developing a method for learning a mapping from exemplar activations to state posterior probabilities that keeps the generalization to unseen conditions that is one of the strongest advantages of sparse coding.

机译：完整的调制频谱是一维音频信号的高维表示。以前在自动语音识别中进行的大多数研究将这种非常丰富的表示形式转换为一系列短时功率谱的等效形式，主要是为了简化对未知语音信号的帧与特定状态相关的后验概率的计算。在本文中，我们将调制频谱分析仪的原始输出与稀疏编码结合使用，作为获取状态后验概率的一种方法。调制频谱分析仪使用15个Gammatone滤波器。这些滤波器的输出的希尔伯特包络随后由带宽高达16 Hz的九个调制频率滤波器处理。使用AURORA-2任务的实验表明，这种新方法很有希望。我们发现必须改进调制频谱分析仪中的中期动力学表示。我们还发现，除了重构未知观测值的准确性外，我们还应通过修改稀疏编码中的代价函数以使样本所代表的类别更为重要，来朝稀疏分类发展。这带来了两个挑战：（1）开发一种将示例的类别占用考虑在内的字典学习方法;（2）开发一种用于学习从示例激活到状态后验概率的映射的方法，该方法使泛化到看不见的条件，即稀疏编码的最大优势之一。

著录项

来源
《EURASIP journal on audio, speech, and music processing 》 |2014年第1期| 共20页
作者
Sara Ahmadi; Seyed Mohammad Ahadi; Bert Cranen; Lou Boves;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition [J] . Shimada Kazuki, Bando Yoshiaki, Mimura Masato, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019 ,第5期

机译：基于多通道NMF信息波束形成的无监督语音增强技术，用于强噪声自动语音识别
2. Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition [J] . Shimada Kazuki, Bando Yoshiaki, Mimura Masato, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019 ,第5期

机译：基于多通道NMF的噪声强度自动语音识别的无监督语音增强
3. Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition [J] . Fazel A., Chakrabartty S. Audio, Speech, and Language Processing, IEEE Transactions on . 2012 ,第4期

机译：用于噪声鲁棒语音识别的稀疏听觉再现内核（SPARK）功能
4. Magnitude replacement of real and imaginary modulation spectrum of acoustic spectrograms for noise-robust speech recognition [C] . Hsin-Ju Hsieh, Jeih-weih Hung IEEE International Conference on Consumer Electronics - Taiwan . 2015

机译：声学频谱图的实部和虚部调制谱的量级替换，用于噪声鲁棒的语音识别
5. The use of discrete distributions with a very large codebook for automatic speech recognition and speaker verification. [D] . Ye, Guoli. 2013

机译：离散分布与非常大的密码本的配合使用可用于自动语音识别和说话者验证。
6. Modulation masking and glimpsing of natural and vocoded speech during single-talker modulated noise: Effect of the modulation spectrum [O] . Daniel Fogerty, Jiaqian Xu, Bobby E. Gibbs II -1

机译：单通话者调制噪声期间自然和语音编码的调制掩蔽和瞥见：调制频谱的影响
7. Sparse coding of the modulation spectrum for noise-robust automatic speech recognition [O] . Sara Ahmadi, Seyed Mohammad Ahadi, Bert Cranen, 2014

机译：调制频谱的稀疏编码，用于噪声稳定的自动语音识别

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅