Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

WANG G.; Sim K.C.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

【24h】

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

机译：用于语音识别的基于回归的上下文依赖的深度神经网络建模

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The data sparsity problem is addressed by using the decision tree state clusters as the training targets for the state-of-the- art context-dependent (CD) deep neural network (DNN) systems. The CD states within a cluster cannot be distinguished at the frame level. We surmise that the state clustering may cause an issue for the standard CD-DNNs, which has so far not been addressed in the literature. In this paper, a logistic regression framework is proposed for the CD-DNNs based on a set of broad phone classes to address both the data sparsity and the clustering problems. To address the data sparsity issue, the triphones are clustered into shorter biphones with broad phone contexts under multiple articulatory categories. A DNN is trained to discriminate the disjoint biphone clusters within each articulatory category. The regression bases are formed by the concatenated log posterior probabilities of all the broad phone DNNs. Logistic regression is used to transform the regression bases into the triphone state posteriors. Clustering of the regression parameters is used to reduce the regression model complexity while still achieving unique acoustic scores for all possible triphones. Based on some approximations, the regression model can be trained as a sparse softmax layer and its parameters can be learned by optimizing the cross-entropy criterion. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD-DNN significantly outperforms the standard CD-DNN. The best system provides a 1.3% absolute word error rate reduction compared to the best standard CD-DNN system.

机译：通过使用决策树状态簇作为最新的上下文相关（CD）深度神经网络（DNN）系统的训练目标，可以解决数据稀疏性问题。群集内的CD状态无法在帧级别上进行区分。我们推测状态聚类可能会导致标准CD-DNN出现问题，到目前为止，文献中尚未解决。在本文中，针对CD-DNN提出了一个逻辑回归框架，该框架基于一组广泛的电话类别，以解决数据稀疏性和聚类问题。为了解决数据稀疏性问题，将三音器组合成较短的双音器，并在多个发音类别下具有广泛的电话上下文。训练DNN来区分每个发音类别中不相交的双音节音群。回归基础由所有广义电话DNN的级联对数后验概率形成。 Logistic回归用于将回归基础转换为三音状态。回归参数的聚类用于减少回归模型的复杂性，同时仍然为所有可能的三音扬声器实现独特的声学得分。基于一些近似，可以将回归模型训练为稀疏softmax层，并可以通过优化交叉熵准则来学习其参数。在广播新闻转录任务上的实验结果表明，所提出的基于回归的CD-DNN明显优于标准CD-DNN。与最好的标准CD-DNN系统相比，最好的系统可减少1.3％的绝对单词错误率。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2014年第11期|1660-1669|共10页
作者
WANG G.; Sim K.C.;
展开▼
作者单位

Human Language Technology Department, Institute for Infocomm Research, Singapore;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Approximation methods; Context; Context modeling; Detectors; Equations; Mathematical model; Training; Articulatory features; context dependent modeling; deep neural network; logistic regression;

机译：近似方法;上下文;上下文建模;探测器;方程;数学模型;训练;发音特征;上下文相关建模;深度神经网络逻辑回归;

相似文献

外文文献
中文文献
专利

1. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
2. ANALYZING THE MODELS OF SPEECH RECOGNITION ON THE BASIS OF NEURAL NETWORKS OF DEEP LEARNING FOR EXAMINATION OF DIGITAL PHONOGRAMS [J] . Solovyov V. I., Rybalskiy O. V., Zhuravel V. V., Cybernetics and Systems Analysis . 2021,第1期

机译：基于深度学习的神经网络分析语音识别模型，以进行数字录音图检查
3. State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition [J] . Zhou Pan, Jiang Hui, Dai Li-Rong, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第4期

机译：基于状态聚类的多深度神经网络语音识别建模方法
4. Refinements of regression-based context-dependent modelling of deep neural networks for automatic speech recognition [C] . Wang Guangsen, Sim Khe Chai IEEE International Conference on Acoustics, Speech and Signal Processing . 2014

机译：用于自动语音识别的基于回归的深度神经网络建模的改进
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT [O] . Doroteo T. Toledano, María Pilar Fernández-Gallego, Alicia Lozano-Diez 2012

机译：基于深度神经网络的自动语音识别的多分辨率语音分析：TIMIT实验
7. Context-Dependent Deep Neural Networks for Commercial Mandarin Speech Recognition Applications [O] . Jianwei Niu, Lei Xie, Lei Jia, 2015

机译：用于商业普通话语音识别应用的上下文相关深度神经网络

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅