首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR
【24h】

Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

机译:低资源ASR的DNN集成声学模型的交叉熵训练

获取原文
获取原文并翻译 | 示例

摘要

Deep neural networks (DNNs) have shown a great promise in exploiting out-of-language data, particularly for under-resourced languages. The common trend is to merge data from various source languages to train a multilingual DNN and then reuse the hidden layers as language-independent feature extractors for a low-resource target language. While there is a consensus that using as much data from various languages results in a better and more general multilingual DNN, employing only source languages similar to the target language has proven effective. In this study, we propose a novel framework for multilingual DNN training, which employs all the available training data and exploits complementary information from individual source languages at the same time. Toward this goal, we borrow the idea of an ensemble with one generalist and many specialists. The generalist is derived from a multilingual DNN acoustic model trained on all available multilingual data; the specialists are the DNNs derived from the source languages individually. Then, the constituents in the ensemble are combined using weighted averaging schemes, where the combination weights are trained to minimize the cross-entropy objective function. In this framework, we seek for complementary information among the constituents while it is possible to get at least the performance equal to the baseline. Moreover, unlike previous well-known system combination schemes, only one model is required during decoding. We successfully examined two combination methodologies and demonstrated their usefulness in different scenarios using the multilingual GlobalPhone dataset. It is observed that, specifically, speech recognition systems developed in low-resource settings profit from the proposed strategy.
机译:深度神经网络(DNN)在开发语言外数据方面表现出了巨大的希望,尤其是对于资源不足的语言。普遍的趋势是合并来自各种源语言的数据以训练多语言DNN,然后将隐藏层用作低资源目标语言的独立于语言的特征提取器。虽然已经达成共识,即使用尽可能多的来自各种语言的数据可以产生更好,更通用的多语言DNN,但事实证明,仅使用类似于目标语言的源语言是有效的。在这项研究中,我们提出了一种用于多语言DNN训练的新颖框架,该框架利用所有可用的训练数据并同时利用来自各个源语言的补充信息。为了实现这一目标,我们借鉴了由一位通才和许多专家组成的合奏的想法。通才来自于对所有可用的多语言数据进行训练的多语言DNN声学模型;专家是分别源自源语言的DNN。然后,使用加权平均方案对集合中的成分进行组合,其中训练组合权重以最小化交叉熵目标函数。在此框架中,我们寻求成分之间的补充信息,同时可能至少获得与基准相当的性能。此外,与先前的众所周知的系统组合方案不同,在解码期间仅需要一个模型。我们成功地检查了两种组合方法,并使用多语言GlobalPhone数据集证明了它们在不同情况下的有用性。可以观察到,具体地说,在资源匮乏的环境中开发的语音识别系统受益于所提出的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号