首页> 外文学位 >Exploring deep learning methods for discovering features in speech signals.
【24h】

Exploring deep learning methods for discovering features in speech signals.

机译:探索用于发现语音信号特征的深度学习方法。

获取原文
获取原文并翻译 | 示例

摘要

This thesis makes three main contributions to the area of speech recognition with Deep Neural Network - Hidden Markov Models (DNN-HMMs).;Firstly, we explore the effectiveness of features learnt from speech databases using Deep Learning for speech recognition. This contrasts with prior works that have largely confined themselves to using traditional features such as Mel Cepstral Coefficients and Mel log filter banks for speech recognition. We start by showing that features learned on raw signals using Gaussian-ReLU Restricted Boltzmann Machines can achieve accuracy close to that achieved with the best traditional features. These features are, however, learnt using a generative model that ignores domain knowledge. We develop methods to discover features that are endowed with meaningful semantics that are relevant to the domain using capsules. To this end, we extend previous work on transforming autoencoders and propose a new autoencoder with a domain-specific decoder to learn capsules from speech databases. We show that capsule instantiation parameters can be combined with Mel log filter banks to produce improvements in phone recognition on TIMIT. On WSJ the word error rate does not improve, even though we get strong gains in classification accuracy. We speculate this may be because of the mismatched objectives of word error rate over an utterance and frame error rate on the sub-phonetic class for a frame.;Secondly, we develop a method for data augmentation in speech datasets. Such methods result in strong gains in object recognition, but have largely been ignored in speech recognition. Our data augmentation encourages the learning of invariance to vocal tract length of speakers. The method is shown to improve the phone error rate on TIMIT and the word error rate on a 14 hour subset of WSJ.;Lastly, we develop a method for learning and using a longer range model of targets, conditioned on the input. This method predicts the labels for multiple frames together and uses a geometric average of these predictions during decoding. It produces state of the art results on phone recognition with TIMIT and also produces significant gains on WSJ.
机译:本文对基于深度神经网络的语音识别领域做出了三个主要贡献-隐马尔可夫模型(DNN-HMMs)。首先,我们探讨了使用深度学习从语音数据库中学习语音特征的有效性。这与先前的工作形成了鲜明对比,以前的工作主要局限于使用传统功能(例如梅尔倒谱系数和梅尔对数滤波器组)进行语音识别。我们首先说明使用高斯-ReLU受限玻尔兹曼机在原始信号上学习的特征可以达到的精度接近于最佳传统特征。但是,这些功能是使用忽略领域知识的生成模型学习的。我们开发了使用胶囊来发现具有与领域相关的有意义语义的特征的方法。为此,我们扩展了有关转换自动编码器的工作,并提出了一种新的自动编码器,该编码器具有特定于域的解码器,以从语音数据库中学习胶囊。我们表明,胶囊实例化参数可以与梅尔对数滤波器组结合使用,以改善TIMIT的电话识别能力。在WSJ上,即使我们在分类准确性上获得了很大的收获,但字词错误率也没有提高。我们推测这可能是由于一帧的子语音类的发声率和误码率目标与单词误码率目标不匹配所致。其次,我们开发了一种语音数据集数据增强方法。这种方法在对象识别方面获得了很大的收益,但是在语音识别中却被很大程度上忽略了。我们的数据扩充鼓励学习说话者的声道长度不变性。结果表明,该方法可以提高TIMIT上的电话错误率和WSJ的14小时子集上的单词错误率。最后,我们根据输入条件,开发了一种用于学习和使用目标范围更大的模型的方法。该方法一起预测多个帧的标签,并在解码期间使用这些预测的几何平均值。它在TIMIT的电话识别中产生了最先进的结果,在WSJ上也取得了明显的进步。

著录项

  • 作者

    Jaitly, Navdeep.;

  • 作者单位

    University of Toronto (Canada).;

  • 授予单位 University of Toronto (Canada).;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 110 p.
  • 总页数 110
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号