首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition
【24h】

M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition

机译:M向量:用于多流自动语音识别的基于子带的能量调制功能

获取原文

摘要

In this paper, we propose a novel method to capture energy modulations from different frequency bands in speech into frame-level feature vectors, called Modulation-vectors or M-vectors, for use in Automatic Speech Recognition (ASR) systems. We show that in different multi-stream setups, with parallel streams for M-vectors and the popular Mel-frequency Cepstral Coefficient (MFCC) features, we can realize a boost in word recognition performance of end-to-end systems by ≈ 5%, and that of a monophone and triphone HMM-GMM ASR system by ≈ 18% and ≈ 16% respectively over using the traditional MFCC features.
机译:在本文中,我们提出了一种新颖的方法,可以将语音中不同频段的能量调制捕获到帧级特征向量中,称为调制向量或M向量,以用于自动语音识别(ASR)系统。我们表明,在不同的多流设置中,通过M矢量的并行流和流行的梅尔频率倒谱系数(MFCC)功能,我们可以实现端到端系统的单词识别性能提高5%左右。 ,与使用传统MFCC功能相比,单音和三音HMM-GMM ASR系统分别降低了≈18%和≈16%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号