首页> 外文会议>Proceedings of the 3rd International Universal Communication Symposium >Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments
【24h】

Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments

机译:混响环境中用于自动语音识别的子带时间包络调制频谱的归一化

获取原文
获取原文并翻译 | 示例

摘要

In this study, we proposed a feature extraction method based on the subband temporal envelopes (STEs) and their normalization for reverberated speech recognition. The STEs were extracted by using a series of constant bandwidth band-pass filters with Hilbert transform followed by a low-pass filtering. In the normalization, both the modulation spectrum (MS) of the subband temporal envelopes of the clean and reverberated speech are normalized to a reference MS calculated from a clean speech data set. Based on the normalized subband MS, the inverse Fourier transform was used to restore the subband temporal envelopes. We tested the proposed method on speech recognition in a reverberant room with different speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel-cepstral coefficients with mean and variance normalization were used as the baseline. Experimental results showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96% relative improvement by only using subband temporal envelope processing, and further a 15.68% relative improvement by using the normalization on the subband modulation spectrum. Totally, there was about a 53.59% relative improvement, which was better than those of using other temporal filtering and normalization methods.
机译:在这项研究中,我们提出了一种基于子带时域包络(STEs)及其归一化的特征提取方法,用于回响语音识别。通过使用一系列带有希尔伯特变换的恒定带宽带通滤波器,然后进行低通滤波来提取STE。在归一化中,干净语音和混响语音的子带时间包络的调制频谱(MS)都归一化为根据干净语音数据集计算出的参考MS。基于归一化的子带MS,使用傅立叶逆变换来恢复子带时间包络。我们在具有不同扬声器到麦克风距离(SMD)的混响室中测试了提出的语音识别方法。为了进行比较,将使用均值和方差归一化的传统梅尔-倒谱系数的识别性能作为基准。实验结果表明,通过将SMD从50 cm到400 cm进行平均,仅使用子带时间包络处理可实现44.96%的相对改善,而通过对子带调制频谱进行归一化则可进一步提高15.68%的相对改善。总体而言,相对改进约为53.59%,优于使用其他时间过滤和归一化方法的相对改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号