首页> 外文学位 >Graphical models for large vocabulary speech recognition.
【24h】

Graphical models for large vocabulary speech recognition.

机译:大词汇量语音识别的图形模型。

获取原文
获取原文并翻译 | 示例

摘要

This thesis presents triangulation methodology and new graphical models for automatic speech recognition. The improved triangulation techniques presented here can lower the computational costs of exact probabilistic inference in graphical models. This thesis is particularly interested in finding triangulations of graphical models used in speech and language applications. The triangulation procedures developed in the graphical model community do not address two aspects of such graphs. The first aspect is that the graphs have a high degree of determinism. It is shown that in the presence of determinism the optimal triangulation can be completely outside the search space of the most widely adopted triangulation techniques. It is also demonstrated that when determinism is present certain large-clique graph triangulations can outperform triangulations with smaller clique sizes. This is counter to the conventional wisdom that triangulations that minimize clique size are always most desirable. Ancestral pairs are presented as the basis for novel triangulation heuristics, and it is proven that no more than the addition of edges between ancestral pairs need to be considered when searching for state space optimal triangulations. A genetic algorithm for large clique triangulations is also presented. Empirical results are given on random and real world graphs. A number of theoretical results are also presented, including an algorithm for determining if a triangulation can be obtained via the elimination algorithm. The second aspect is that speech graphs are variable length and have a repeating structure. Triangulation techniques are developed that are not limited by the repeating structure as defined by the graph designer.;The second goal of this thesis is to develop novel graphical models for improving recognition performance. A set of models is presented that enhance the standard model with information about syllabic segmentations. This segmentation information comes in the form of syllable nuclei locations. Using estimated locations, the graph gives improved discrimination between speech and noise when compared to a baseline model. Using locations derived from oracle information an overall improvement is given, and when the oracle syllable nuclei information is augmented with information about lexical stress it gives additional improvements over locations alone.
机译:本文提出了三角剖分方法和自动语音识别的新图形模型。此处介绍的改进的三角剖分技术可以降低图形模型中精确概率推断的计算成本。本论文对寻找语音和语言应用中使用的图形模型的三角剖分特别感兴趣。在图形模型社区中开发的三角测量程序未解决此类图的两个方面。第一个方面是图具有高度的确定性。结果表明,在存在确定性的情况下,最佳三角剖分可能完全超出了最广泛采用的三角剖分技术的搜索空间。还证明了,当存在确定性时,某些较大的图三角剖分可能会优于具有较小派系大小的三角剖分。这与传统的观点相反,即总是最需要使组团大小最小的三角剖分。祖先对被提出作为新颖的三角剖分启发法的基础,并且证明了在寻找状态空间最优三角剖分时只需要考虑祖先对之间的边的增加。还提出了一种用于大型集团三角剖分的遗传算法。在随机和真实世界图上给出了经验结果。还提出了许多理论结果,包括用于确定是否可以通过消除算法获得三角剖分的算法。第二方面是语音图是可变长度的并且具有重复结构。三角剖分技术的发展不受图形设计者定义的重复结构的限制。本论文的第二个目标是开发新颖的图形模型以提高识别性能。提出了一组模型,这些模型使用有关音节分割的信息来增强标准模型。该分段信息以音节核位置的形式出现。与基线模型相比,使用估计的位置,该图可改善语音和噪声之间的区别。通过使用从甲骨文信息中得出的位置,可以进行整体的改进,当甲骨文音节信息中增加了与词法重音有关的信息时,就可以对位置进行单独的改进。

著录项

  • 作者

    Bartels, Chris Dennis.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 189 p.
  • 总页数 189
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号