Graphical models for large vocabulary speech recognition.

机译：大词汇量语音识别的图形模型。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis presents triangulation methodology and new graphical models for automatic speech recognition. The improved triangulation techniques presented here can lower the computational costs of exact probabilistic inference in graphical models. This thesis is particularly interested in finding triangulations of graphical models used in speech and language applications. The triangulation procedures developed in the graphical model community do not address two aspects of such graphs. The first aspect is that the graphs have a high degree of determinism. It is shown that in the presence of determinism the optimal triangulation can be completely outside the search space of the most widely adopted triangulation techniques. It is also demonstrated that when determinism is present certain large-clique graph triangulations can outperform triangulations with smaller clique sizes. This is counter to the conventional wisdom that triangulations that minimize clique size are always most desirable. Ancestral pairs are presented as the basis for novel triangulation heuristics, and it is proven that no more than the addition of edges between ancestral pairs need to be considered when searching for state space optimal triangulations. A genetic algorithm for large clique triangulations is also presented. Empirical results are given on random and real world graphs. A number of theoretical results are also presented, including an algorithm for determining if a triangulation can be obtained via the elimination algorithm. The second aspect is that speech graphs are variable length and have a repeating structure. Triangulation techniques are developed that are not limited by the repeating structure as defined by the graph designer.;The second goal of this thesis is to develop novel graphical models for improving recognition performance. A set of models is presented that enhance the standard model with information about syllabic segmentations. This segmentation information comes in the form of syllable nuclei locations. Using estimated locations, the graph gives improved discrimination between speech and noise when compared to a baseline model. Using locations derived from oracle information an overall improvement is given, and when the oracle syllable nuclei information is augmented with information about lexical stress it gives additional improvements over locations alone.

机译：本文提出了三角剖分方法和自动语音识别的新图形模型。此处介绍的改进的三角剖分技术可以降低图形模型中精确概率推断的计算成本。本论文对寻找语音和语言应用中使用的图形模型的三角剖分特别感兴趣。在图形模型社区中开发的三角测量程序未解决此类图的两个方面。第一个方面是图具有高度的确定性。结果表明，在存在确定性的情况下，最佳三角剖分可能完全超出了最广泛采用的三角剖分技术的搜索空间。还证明了，当存在确定性时，某些较大的图三角剖分可能会优于具有较小派系大小的三角剖分。这与传统的观点相反，即总是最需要使组团大小最小的三角剖分。祖先对被提出作为新颖的三角剖分启发法的基础，并且证明了在寻找状态空间最优三角剖分时只需要考虑祖先对之间的边的增加。还提出了一种用于大型集团三角剖分的遗传算法。在随机和真实世界图上给出了经验结果。还提出了许多理论结果，包括用于确定是否可以通过消除算法获得三角剖分的算法。第二方面是语音图是可变长度的并且具有重复结构。三角剖分技术的发展不受图形设计者定义的重复结构的限制。本论文的第二个目标是开发新颖的图形模型以提高识别性能。提出了一组模型，这些模型使用有关音节分割的信息来增强标准模型。该分段信息以音节核位置的形式出现。与基线模型相比，使用估计的位置，该图可改善语音和噪声之间的区别。通过使用从甲骨文信息中得出的位置，可以进行整体的改进，当甲骨文音节信息中增加了与词法重音有关的信息时，就可以对位置进行单独的改进。

著录项

作者
Bartels, Chris Dennis.;
展开▼
作者单位

University of Washington.;

展开▼
授予单位 University of Washington.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2008
页码 189 p.
总页数 189
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition [J] . Jing Zheng, Horacio Franco, Andreas Stolcke Speech Communication . 2003,第2a3期

机译：大型词汇会话语音识别中的词级语音变化率建模
2. An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition [J] . Bert Reveil, Kris Demuynck, Jean-Pierre Martens Computer speech and language . 2014,第1期

机译：一种改进的两阶段混合语言模型方法，用于处理大词汇量连续语音识别中的词汇外单词
3. Improving language models for radiology speech recognition. [J] . Paulett JM, Langlotz CP Journal of biomedical informatics. . 2009,第1期

机译：改进放射学语音识别的语言模型。
4. Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks [C] . J. C. Segura, A. de la Torre, M. C. Benitez, European conference on speech communication and technology . 2001

机译：基于模型的连续语音识别添加剂噪声补偿。使用Aurora II数据库和任务的实验
5. Modeling lexical tones for Mandarin large vocabulary continuous speech recognition. [D] . Lei, Xin. 2006

机译：为普通话大词汇量连续语音识别建模词汇声调。
6. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition [O] . Edvin Pakoci, Branislav Popović, Darko Pekar 2019

机译：在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据
7. Sparse gaussian graphical models for speech recognition. [O] . Bell Peter, King Simon 2007

机译：用于语音识别的稀疏高斯图形模型。
8. Vocabulary and Environment Adaptation in Vocabulary-Independent Speech Recognition. [R] . Hon, H., Lee, K. 1992

机译：词汇独立语音识别中的词汇与环境适应。

Graphical models for large vocabulary speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅