首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition
【24h】

Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition

机译:卷积RBM用于语音识别的新型无监督听觉滤波器组学习

获取原文
获取原文并翻译 | 示例

摘要

To learn auditory filterbanks, recently, we have proposed an unsupervised learning model based on convolutional restricted Boltzmann machine (RBM) with rectified linear units. In this paper, theory, training algorithm of our proposed model, and detailed analysis of learned filterbank are being presented. Learning of the model with different databases shows that the model is able to learn cochlear-like impulse responses that are localized in frequency-domain. An auditory-like scale obtained from filterbanks learned from clean and noisy datasets resembles the Mel scale, which is known to mimic perceptually relevant aspect of speech. We have experimented with both cepstral (denoted as ConvRBM-CC) as well as filterbank features (denoted as ConvRBM-BANK). On large vocabulary continuous speech recognition task, we achieved relative improvement of 7.21-17.8% in word error rate (WER) compared to Mel frequency cepstral coefficient (MFCC) features and 1.35-6.82% compared to Mel filterbank (FBANK) features. On AURORA 4 multicondition training database, the relative improvement in WER by 4.8-13.65% was achieved using a Hybrid Deep Neural Network-Hidden Markov Model (DNN-HMM) system with ConvRBM-CC features. Using ConvRBM-BANK features, we achieve absolute reduction of 1.25-3.85% in WER on AURORA 4 test sets compared to FBANK features. A context-dependent DNN-HMM system further improves performance with a relative improvement of 3.6-4.6% on an average for bigram 5k and tri-gram 5k language models. Hence, our proposed learned filterbank performs better than traditional MFCC and Mel-filterbank features for both clean and multicondition automatic speech recognition (ASR) tasks. A system combination of ConvRBM-BANK and FBANK features further improve performance in all ASR tasks. Cross-domain experiments where subband filters trained on one database are used for the ASR task of another database show that model learns generalized representations of speech signals.
机译:为了学习听觉滤波器组,最近,我们提出了一种基于卷积受限Boltzmann机器(RBM)且具有线性单元校正的无监督学习模型。本文介绍了理论,我们提出的模型的训练算法以及对学习的滤波器组的详细分析。对具有不同数据库的模型的学习表明,该模型能够学习位于频域的类似耳蜗的脉冲响应。从干净的和嘈杂的数据集中学到的从滤波器库中获得的类似于听觉的量表类似于梅尔量表,已知该量表模仿语音的感知相关方面。我们已经进行了倒谱(表示为ConvRBM-CC)和滤波器组特征(表示为ConvRBM-BANK)的实验。在大型词汇连续语音识别任务上,与梅尔频率倒谱系数(MFCC)特征相比,我们的单词错误率(WER)相对提高了7.21-17.8%,与梅尔滤波器库(FBANK)特征相比,相对提高了1.35-6.82%。在AURORA 4多条件训练数据库上,使用具有ConvRBM-CC功能的混合深度神经网络隐马尔可夫模型(DNN-HMM)系统,WER相对提高了4.8-13.65%。与FBANK功能相比,使用ConvRBM-BANK功能,我们在AURORA 4测试集上的WER绝对降低了1.25-3.85%。依赖于上下文的DNN-HMM系统进一步提高了性能,对于bigram 5k和tri-gram 5k语言模型而言,平均相对提高了3.6-4.6%。因此,对于干净的和多条件的自动语音识别(ASR)任务,我们提出的学习型滤波器组的性能优于传统的MFCC和Mel-filterbank功能。 ConvRBM-BANK和FBANK功能的系统组合进一步提高了所有ASR任务的性能。在一个数据库上训练的子带滤波器用于另一个数据库的ASR任务的跨域实验表明,该模型可以学习语音信号的广义表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号