M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition

机译：M向量：用于多流自动语音识别的基于子带的能量调制功能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel method to capture energy modulations from different frequency bands in speech into frame-level feature vectors, called Modulation-vectors or M-vectors, for use in Automatic Speech Recognition (ASR) systems. We show that in different multi-stream setups, with parallel streams for M-vectors and the popular Mel-frequency Cepstral Coefficient (MFCC) features, we can realize a boost in word recognition performance of end-to-end systems by ≈ 5%, and that of a monophone and triphone HMM-GMM ASR system by ≈ 18% and ≈ 16% respectively over using the traditional MFCC features.

机译：在本文中，我们提出了一种新颖的方法，可以将语音中不同频段的能量调制捕获到帧级特征向量中，称为调制向量或M向量，以用于自动语音识别（ASR）系统。我们表明，在不同的多流设置中，通过M矢量的并行流和流行的梅尔频率倒谱系数（MFCC）功能，我们可以实现端到端系统的单词识别性能提高5％左右。，与使用传统MFCC功能相比，单音和三音HMM-GMM ASR系统分别降低了≈18％和≈16％。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|6545-6549|共5页
会议地点
作者
Samik Sadhu; Ruizhi Li; Hynek Hermansky;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Mel frequency cepstral coefficient; Hidden Markov models; Speech recognition; Microsoft Windows; Frequency modulation; Frequency-domain analysis;

机译：梅尔频率倒谱系数;隐马尔可夫模型;语音识别; Microsoft Windows;频率调制;频域分析;

相似文献

外文文献
中文文献
专利

1. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [J] . Xugang Lu, Masashi Unoki, Masato Akagi Acoustical science and technology . 2008,第6期

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器
2. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [J] . Masashi Unoki, Masato Akagi, Xugang Lu Acoustical science and technology . 2008,第6期

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器
3. Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments [J] . Xugang Lu, Masashi Unoki, Satoshi Nakamura Computer speech and language . 2011,第3期

机译：混响环境中用于自动语音识别的子带时间调制包络及其标准化
4. M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition [C] . Samik Sadhu, Ruizhi Li, Hynek Hermansky IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：M-Vectors：基于子带基于的能量调制功能，用于多流自动语音识别
5. Ensemble feature selection for multi-stream automatic speech recognition. [D] . Gelbart, David. 2008

机译：集成特征选择，用于多流自动语音识别。
6. A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：在带通滤波调制多流功能根据框架鲁棒语音识别
7. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [O] . Lu, Xugang, Unoki, Masashi, Akagi, Masato 2008

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器

M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅