首页> 外文学位 >Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks.
【24h】

Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks.

机译:用于自动语音识别的声学模型:推导判别式高斯网络。

获取原文
获取原文并翻译 | 示例

摘要

Despite the considerable progress made in recent years, automatic speech recognition is far from being a solved problem. In particular, the accuracy of a speech recognizer degrades dramatically when there is a mismatch between the training and real usage conditions.; State-of-the-art speech recognizers use hidden Markov models (HMMs) and Gaussian mixture models (GMMs) with millions of parameters to model speech. The set of all these models is called the acoustic model set of the speech recognizer. The parameters are trained with speech from thousands of different speakers to capture the variabilities of speech. However, the current acoustic model set over-generalizes and is not able to capture certain constraints in speech that are relevant for recognition. For example, the acoustic model set does not take into account that the gender of a speaker cannot change within an utterance. Furthermore, experiments have shown that the acoustic model set is often not able to take advantage of the vastly increasing amount of training data that is now available with commercial applications.; In this work, a novel technique for deriving discriminative Gaussian networks (GNs) from training data is presented. The Gaussian networks can be viewed as HMM/GMM models that have complex HMM structures, and simple, single Gaussian GMMs. The models are iteratively grown in complexity by splitting HMM states into two states. For each iteration the algorithm splits the states that are expected to give the most significant error rate reduction. The model parameters are discriminatively trained as well, using an improved version of the maximum mutual information (MMI) training algorithm.; Evaluations using the Aurora 2 industry standard benchmark, and a small vocabulary recognition task, show that GN acoustic models are both more accurate and more robust than comparable HMM/GMM acoustic models.
机译:尽管近年来取得了长足的进步,但自动语音识别远非解决的问题。特别是,当训练条件和实际使用条件不匹配时,语音识别器的准确性将大大降低。最先进的语音识别器使用具有数百万个参数的隐马尔可夫模型(HMM)和高斯混合模型(GMM)来建模语音。所有这些模型的集合称为语音识别器的声学模型集合。使用来自数千个不同说话者的语音来训练参数,以捕获语音的可变性。但是,当前的声学模型集过于笼统,并且无法捕获语音中与识别相关的某些约束。例如,声学模型集没有考虑说话者的性别不能在说话中改变。此外,实验表明,声学模型集通常无法利用现在商业应用中可用的大量训练数据。在这项工作中,提出了一种从训练数据中导出判别式高斯网络(GNs)的新技术。高斯网络可以视为具有复杂HMM结构以及简单的单个高斯GMM的HMM / GMM模型。通过将HMM状态分为两个状态,模型在复杂度上迭代增长。对于每次迭代,算法都会拆分预期会最大程度降低错误率的状态。使用最大互信息(MMI)训练算法的改进版本,也可以区别地训练模型参数。使用Aurora 2行业标准基准进行的评估以及少量的词汇识别任务表明,GN声学模型比同类的HMM / GMM声学模型更准确,更可靠。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号