首页> 外文OA文献 >Across-word phoneme models for large vocabulary continuous speech recognition

【2h】

Across-word phoneme models for large vocabulary continuous speech recognition

机译：用于大词汇量连续语音识别的跨字音素模型

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this work, the application of across-word phoneme models during large vocabulary continuous speech recognition is studied. A recognition system will be developed which allows for the training of high performance across-word phoneme models, the efficient application of these across-word phoneme models in combination with long-span language models in one single search pass, and the construction of word graphs. In contrast to within-word phoneme models which consider the context dependency of the phonemes representing the words in the vocabulary only within the words and use a reduced phonetic context at word boundaries, across-word phoneme models consider the context dependency of the phonemes also across word boundaries. As it is known for many years, this results in significant word error rate improvements but also in a considerably higher computational effort. Today, across-word phoneme models are applied by a number of groups. However, the published descriptions of these recognition systems are often quite general, many implementation details needed for the successful application of across-word phoneme models are usually missing. In this work, all details about the transformation of a baseline within-word model system into an across-word model system will be discussed. It will be analyzed in detail how the introduction of across-word phoneme models affects word error rate, runtime and memory requirements of the recognition system. First, the across-word model paradigm will be integrated into the very general Bayes' decision rule which is the basis of speech recognition. Taking all model assumptions and approximations needed for the application of across-word models into account, a specialized decision rule will be derived. Based on this specialized decision rule the across-word model system will be developed. Compared to the baseline within-word model system, the introduction of across-word phoneme models results in a significantly more complex search network. The efficient application of across-word phoneme models in combination with long-span language models in one single search pass requires a careful design of the search network as well as of the search algorithm which will be discussed in detail. In contrast to the baseline within-word model training, the phonetic representation of the training utterances is not unique anymore if across-word models are to be trained. Furthermore, the parameterization of the baseline within-word model training should be modified in order to obtain optimally performing across-word models. Finally, the introduction of across-word models affects also the construction of word graphs. In order to optimize the runtime of the developed across-word model search further, several acceleration methods will be applied which have partly already been discussed for within-word model systems in the literature. In addition, methods for further increasing the accuracy of across-word models will be studied which are based on a refined pronunciation modeling. The developed across-word system will be finally evaluated on three different speech corpora by comparing the recognition results of this system to the recognition results of the baseline within-word model system. On two of the corpora, these results will also be compared to the results of other research groups, as they are published in the literature. It will be seen that the developed recognition system produces state-of-the-art word error rates.

机译：在这项工作中，研究了跨单词音素模型在大词汇量连续语音识别中的应用。将开发一种识别系统，该系统可以训练高性能的跨单词音素模型，在一次搜索中将这些跨单词音素模型与大跨度语言模型结合起来进行有效应用，以及构建单词图。与单词内音素模型相比，后者仅考虑代表词汇表中单词的音素的上下文相关性，并在单词边界处使用简化的语音上下文，而跨单词音素模型则考虑跨多个音素的上下文相关性字边界。众所周知，这导致显着的字错误率改善，但也导致相当大的计算量。如今，跨单词音素模型已被许多组织采用。但是，这些识别系统的已发布描述通常是相当笼统的，成功使用跨字音素模型所需的许多实现细节通常会丢失。在这项工作中，将讨论有关将基准词内模型系统转换为跨词模型系统的所有详细信息。将详细分析跨字音素模型的引入如何影响识别系统的字错误率，运行时间和内存要求。首先，跨字模型范式将被集成到非常通用的贝叶斯决策规则中，该规则是语音识别的基础。考虑到跨字模型应用所需的所有模型假设和近似值，将得出专门的决策规则。基于此专门的决策规则，将开发跨字模型系统。与基准词内模型系统相比，跨词音素模型的引入导致搜索网络更加复杂。跨单词音素模型与大跨度语言模型的有效应用在单个搜索过程中需要对搜索网络以及搜索算法进行仔细设计，这将在后面详细讨论。与基准词内模型训练相反，如果要训练跨词模型，则训练话语的语音表示不再唯一。此外，基线字内模型训练的参数化应进行修改，以获得最佳的跨字模型。最后，跨字模型的引入也会影响字图的构建。为了进一步优化所开发的跨词模型搜索的运行时间，将应用几种加速方法，这些方法在文献中已经针对词内模型系统进行了部分讨论。另外，将研究基于精细发音模型的进一步提高跨词模型准确性的方法。通过将本系统的识别结果与基准词内模型系统的识别结果进行比较，最终将在三种不同的语音语料库上对开发的跨词系统进行评估。在两个语料库上，这些结果还将与文献中发表的其他研究组的结果进行比较。可以看出，开发的识别系统可产生最新的单词错误率。

著录项

作者
Sixtus Achim;
展开▼
作者单位

展开▼
年度 2003
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Improved Phoneme-History-Dependent Search Method for Large-Vocabulary Continuous-Speech Recognition [J] . Takaaki HORI, Yoshiaki NODA, Shoichi MATSUNAGA IEICE Transactions on Information and Systems . 2003,第6期

机译：改进的基于音素历史的大词汇量连续语音识别搜索方法
2. A Study on a Phoneme-graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition [J] . TAKAAKI HORI, NAOKI OKA, MASAHARU KAfOH 情報処理学会論文誌 . 1999,第4期

机译：基于音素图的大词汇量连续语音识别假设限制的研究
3. A phoneme-based approach for eliminating out-of-vocabulary problem of Turkish speech recognition using Hidden Markov Model [J] . Yavuz Erdem, Topuz Vedat International Journal of Computer Systems Science & Engineering . 2018,第6期

机译：基于音素的隐马尔可夫模型消除土耳其语音识别的语音不足问题
4. Training of across-word phoneme models for large vocabulary continuous speech recognition [C] . Sixtus, A., Ney, . 2002

机译：跨单词音素模型的训练，用于大词汇量连续语音识别
5. Modeling lexical tones for Mandarin large vocabulary continuous speech recognition. [D] . Lei, Xin. 2006

机译：为普通话大词汇量连续语音识别建模词汇声调。
6. Neural speech recognition: Continuous phoneme decoding using spatiotemporal representations of human cortical activity [O] . David A Moses, Nima Mesgarani, Matthew K Leonard, -1

机译：神经语音识别：使用人类皮层活动的时空表示进行连续音素解码
7. Training of across-word phoneme models for large vocabulary continuous speech recognition [O] . Sixtus Achim, Ney Hermann 2002

机译：跨单词音素模型的训练，用于大词汇量连续语音识别

Across-word phoneme models for large vocabulary continuous speech recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅