首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition
【24h】

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

机译:用于语音识别的基于回归的上下文依赖的深度神经网络建模

获取原文
获取原文并翻译 | 示例
           

摘要

The data sparsity problem is addressed by using the decision tree state clusters as the training targets for the state-of-the- art context-dependent (CD) deep neural network (DNN) systems. The CD states within a cluster cannot be distinguished at the frame level. We surmise that the state clustering may cause an issue for the standard CD-DNNs, which has so far not been addressed in the literature. In this paper, a logistic regression framework is proposed for the CD-DNNs based on a set of broad phone classes to address both the data sparsity and the clustering problems. To address the data sparsity issue, the triphones are clustered into shorter biphones with broad phone contexts under multiple articulatory categories. A DNN is trained to discriminate the disjoint biphone clusters within each articulatory category. The regression bases are formed by the concatenated log posterior probabilities of all the broad phone DNNs. Logistic regression is used to transform the regression bases into the triphone state posteriors. Clustering of the regression parameters is used to reduce the regression model complexity while still achieving unique acoustic scores for all possible triphones. Based on some approximations, the regression model can be trained as a sparse softmax layer and its parameters can be learned by optimizing the cross-entropy criterion. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD-DNN significantly outperforms the standard CD-DNN. The best system provides a 1.3% absolute word error rate reduction compared to the best standard CD-DNN system.
机译:通过使用决策树状态簇作为最新的上下文相关(CD)深度神经网络(DNN)系统的训练目标,可以解决数据稀疏性问题。群集内的CD状态无法在帧级别上进行区分。我们推测状态聚类可能会导致标准CD-DNN出现问题,到目前为止,文献中尚未解决。在本文中,针对CD-DNN提出了一个逻辑回归框架,该框架基于一组广泛的电话类别,以解决数据稀疏性和聚类问题。为了解决数据稀疏性问题,将三音器组合成较短的双音器,并在多个发音类别下具有广泛的电话上下文。训练DNN来区分每个发音类别中不相交的双音节音群。回归基础由所有广义电话DNN的级联对数后验概率形成。 Logistic回归用于将回归基础转换为三音状态。回归参数的聚类用于减少回归模型的复杂性,同时仍然为所有可能的三音扬声器实现独特的声学得分。基于一些近似,可以将回归模型训练为稀疏softmax层,并可以通过优化交叉熵准则来学习其参数。在广播新闻转录任务上的实验结果表明,所提出的基于回归的CD-DNN明显优于标准CD-DNN。与最好的标准CD-DNN系统相比,最好的系统可减少1.3%的绝对单词错误率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号