Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

Reza Sahraeian; Dirk Van Compernolle

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

【24h】

Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

机译：低资源ASR的DNN集成声学模型的交叉熵训练

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep neural networks (DNNs) have shown a great promise in exploiting out-of-language data, particularly for under-resourced languages. The common trend is to merge data from various source languages to train a multilingual DNN and then reuse the hidden layers as language-independent feature extractors for a low-resource target language. While there is a consensus that using as much data from various languages results in a better and more general multilingual DNN, employing only source languages similar to the target language has proven effective. In this study, we propose a novel framework for multilingual DNN training, which employs all the available training data and exploits complementary information from individual source languages at the same time. Toward this goal, we borrow the idea of an ensemble with one generalist and many specialists. The generalist is derived from a multilingual DNN acoustic model trained on all available multilingual data; the specialists are the DNNs derived from the source languages individually. Then, the constituents in the ensemble are combined using weighted averaging schemes, where the combination weights are trained to minimize the cross-entropy objective function. In this framework, we seek for complementary information among the constituents while it is possible to get at least the performance equal to the baseline. Moreover, unlike previous well-known system combination schemes, only one model is required during decoding. We successfully examined two combination methodologies and demonstrated their usefulness in different scenarios using the multilingual GlobalPhone dataset. It is observed that, specifically, speech recognition systems developed in low-resource settings profit from the proposed strategy.

机译：深度神经网络（DNN）在开发语言外数据方面表现出了巨大的希望，尤其是对于资源不足的语言。普遍的趋势是合并来自各种源语言的数据以训练多语言DNN，然后将隐藏层用作低资源目标语言的独立于语言的特征提取器。虽然已经达成共识，即使用尽可能多的来自各种语言的数据可以产生更好，更通用的多语言DNN，但事实证明，仅使用类似于目标语言的源语言是有效的。在这项研究中，我们提出了一种用于多语言DNN训练的新颖框架，该框架利用所有可用的训练数据并同时利用来自各个源语言的补充信息。为了实现这一目标，我们借鉴了由一位通才和许多专家组成的合奏的想法。通才来自于对所有可用的多语言数据进行训练的多语言DNN声学模型；专家是分别源自源语言的DNN。然后，使用加权平均方案对集合中的成分进行组合，其中训练组合权重以最小化交叉熵目标函数。在此框架中，我们寻求成分之间的补充信息，同时可能至少获得与基准相当的性能。此外，与先前的众所周知的系统组合方案不同，在解码期间仅需要一个模型。我们成功地检查了两种组合方法，并使用多语言GlobalPhone数据集证明了它们在不同情况下的有用性。可以观察到，具体地说，在资源匮乏的环境中开发的语音识别系统受益于所提出的策略。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2018年第11期|1991-2001|共11页
作者
Reza Sahraeian; Dirk Van Compernolle;
展开▼
作者单位

ASML, AG Eindhoven, The Netherlands;

Center of Processing Speech and Image, KU Leuven-ESAT, Heverlee, Belgium;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Training; Acoustics; Hidden Markov models; Feature extraction; Interpolation; Neural networks; Training data;

机译：训练;声学;隐马尔可夫模型;特征提取;插值;神经网络;训练数据;

相似文献

外文文献
中文文献
专利

1. Diversity-driven Semi-supervised Ensemble DNN Acoustic Model Training [J] . Sheng LI, Xugang LU, Shinsuke SAKAI, 電子情報通信学会技術研究報告. 音声. Speech . 2016,第189期

机译：多样性驱动的半监督合奏DNN声学模型训练
2. Ensemble Acoustic Modeling for CD-DNN-HMM Using Random Forests of Phonetic Decision Trees [J] . Zhao Tuo, Zhao Yunxin, Chen Xin Journal of VLSI signal processing systems for signal, image, and video technology . 2016,第2期

机译：使用语音决策树的随机森林对CD-DNN-HMM进行集成声学建模
3. Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems’ Hypotheses [J] . Sheng Li, Yuya Akita, Tatsuya Kawahara Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第9期

机译：通过从多个ASR系统的假设中进行区分数据选择来半监督声学模型训练
4. Semi-supervised ensemble DNN acoustic model training [C] . Sheng Li, Xugang Lu, Shinsuke Sakai, IEEE International Conference on Acoustics, Speech and Signal Processing . 2017

机译：半监督集成DNN声学模型训练
5. Deep Neural Network acoustic models for ASR. [D] . Mohamed, Abdel-rahman. 2014

机译：适用于ASR的深度神经网络声学模型。
6. Co-Training for Visual Object Recognition Based on Self-Supervised Models Using a Cross-Entropy Regularization [O] . Gabriel Díaz, Billy Peralta, Luis Caro, 2021

机译：基于使用跨熵正则化的自我监控模型的视觉对象识别共同培训
7. Exploiting Eigenposteriors for Semi-Supervised Training of DNN Acoustic Models with Sequence Discrimination [O] . Pranay Dighe, Afsaneh Asaei, Hervé Bourlard 2017

机译：利用序列鉴别的DNN声学模型的半监督训练特征

Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅