首页> 外文期刊>Systems and Computers in Japan >Simultaneous Clustering of Phonetic Context, Dimension, and State Position for Acoustic Modeling Using Decision Trees
【24h】

Simultaneous Clustering of Phonetic Context, Dimension, and State Position for Acoustic Modeling Using Decision Trees

机译:使用决策树对语音上下文,维度和状态位置进行同时聚类以进行声学建模

获取原文
获取原文并翻译 | 示例
           

摘要

Recently, context-dependent hidden Markov models that take a phone's preceding and succeeding phonetic context into account have become widely used as acoustic models in continuous speech recognition systems. However, the use of context-dependent hidden Markov models results in an increase in the total number of models, thereby creating a system that includes an extremely large number of free parameters, and it therefore becomes difficult to reliably estimate such parameters from observed statistics. For this reason, parameter-tying methods whereby parameters are shared between models have been proposed. Of these, tying states on the basis of decision trees has proved to be one particularly good method for resolving this problem. However, because the parameter-tying structures created in such methods typically use all dimensions of the feature vector as the unit for each state in the parameter-tying structure, tying all dimensions simultaneously, we are faced with the problem that it is not possible to construct different structures for the sharing of parameters for each individual dimension, or therefore to assign the appropriate number of parameters to each one. Here, introducing a method for partitioning the feature dimensions on the basis of the minimum description length criterion, we extend phonetic decision trees, proposing a decision tree clustering method that accommodates both phones and dimensions. In addition, adding a partition condition related to state position, we propose a method for simultaneously clustering phonetic context, dimension, and state position using decision trees. We show that in speaker-independent continuous speech recognition the proposed method brings a reduction of 13 to 15 percent in error rate when compared to previous state tying methods based on phonetic decision trees.
机译:近来,考虑了电话的前后语音上下文的上下文相关的隐式马尔可夫模型已被广泛用作连续语音识别系统中的声学模型。但是,使用上下文相关的隐式马尔可夫模型会导致模型总数增加,从而创建一个包含大量自由参数的系统,因此很难从观察到的统计数据中可靠地估计此类参数。因此,提出了在模型之间共享参数的参数绑定方法。其中,基于决策树的状态绑定已被证明是解决此问题的一种特别好的方法。但是,由于以这种方法创建的参数绑定结构通常使用特征向量的所有维度作为参数绑定结构中每个状态的单位,同时绑定所有维度,因此我们面临着以下问题:构建用于共享每个维度的参数的不同结构,或因此为每个维度分配适当数量的参数。在此,介绍一种基于最小描述长度标准对特征维度进行划分的方法,我们扩展了语音决策树,提出了一种同时容纳电话和维度的决策树聚类方法。另外,添加与状态位置有关的分区条件,我们提出了一种使用决策树同时聚类语音上下文,维度和状态位置的方法。我们表明,与以前基于语音决策树的状态绑定方法相比,在独立于说话者的连续语音识别中,该方法可将错误率降低13%至15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号