首页> 外文OA文献 >Word posterior probabilities for large vocabulary continuous speech recognition
【2h】

Word posterior probabilities for large vocabulary continuous speech recognition

机译:大词汇量连续语音识别的单词后验概率

摘要

In this thesis, the use of word posterior probabilities for large vocabulary continuous speech recognition is investigated in a unified, statistical framework. The word posterior probabilities are directly derived from the sentence posterior probabilities which are an essential part of Bayes' Decision Rule. Different approaches to the computation of these probabilities using N-best lists and word graphs are discussed, both theoretically and experimentally. The word posterior probabilities are used as confidence measures for various applications. It is shown that these probabilities are the best confidence measure among those studied in this work. The performance of the confidence measures is evaluated in a unified framework using two evaluation metrics and five highly different speech corpora. The relative reduction of the confidence error rates with the word posterior probabilities ranges between 18.6% and 35.4%. In order to show the usefulness of the suggested confidence measure, the word posterior probabilities are applied to restrict maximum-likelihood-linear-regression adaptation to those acoustic segments with a high confidence. In doing so, incorrectly recognised parts of the transcription can be excluded from the adaptation algorithm. Using this method, the word error rate is reduced by 4.8% relative on a German spontaneous-speech test set. In a very similar manner, the word posterior probabilities are used to train an American Broadcast News recogniser with automatically generated, i.e., recognised transcriptions. Only those parts of the acoustic training corpus are used where the confidence of the transcription is sufficiently high. In order to bootstrap an initial low-cost speech recognition system which can be used to recognise large quantities of untranscribed speech data for training purposes, a small amount of speech is transcribed manually. This small speech database with the manually generated transcriptions is then used to train the low-cost speech recogniser, which can be used to recognise the training corpus. Finally, the process of recognising the training corpus and of estimating the model parameters with the recognised transcriptions is applied iteratively. The word error rates on two American Broadcast News test sets rise by only 14.6% and by 16.6%, respectively, in comparison with a fully tuned speech recognition system trained on 72 hours of manually transcribed data. Finally, two new sentence hypothesis scoring approaches are presented. Both of these approaches are based on word posterior probabilities. In the first approach which still aims at minimising the expected number of sentence errors, the word posterior probabilities are used to replace the acoustic and language model probabilities during the scoring algorithm. Using this method, the word error rates are reduced by between 1.5% and 5.1% relative on the five speech corpora used in this thesis. In the second approach, the expected number of word errors is minimised explicitly instead of the expected number of sentence errors. To this end, a cost function is used which is based on the observation that the identity of words cannot only be compared on the basis of a Levenshtein-Alignment, but also on the basis of points in time. With this new cost function, an efficient decision rule is derived which can be evaluated very elegantly and which makes use of the word posterior probabilities. The word error rates on the different testing corpora are reduced consistently with this new decision rule by 2.3% to 5.1% relative.
机译:本文在统一的统计框架下研究了单词后验概率在大词汇量连续语音识别中的应用。单词后验概率直接来自句子后验概率,这是贝叶斯决策规则的重要组成部分。在理论上和实验上,都讨论了使用N个最佳列表和单词图来计算这些概率的不同方法。后验概率一词被用作各种应用的置信度。结果表明,在本研究中,这些概率是最佳置信度。使用两个评估指标和五个高度不同的语音语料在统一框架中评估置信度的性能。单词后验概率的置信错误率的相对降低范围在18.6%和35.4%之间。为了显示建议的置信度度量的有用性,应用后验概率词将最大似然线性回归自适应限制为具有高置信度的那些声段。这样做可以将错误识别的转录部分从适应算法中排除。使用此方法,相对于德国自发语音测试集,单词错误率降低了4.8%。以非常相似的方式,后验概率词用于训练具有自动生成的即识别的转录的美国广播新闻识别器。仅在转录的置信度足够高的情况下使用声学训练语料库的那些部分。为了引导可用于识别大量未转录语音数据以进行培训的初始低成本语音识别系统,需要手动转录少量语音。然后,将这个具有人工生成的转录的小型语音数据库用于训练低成本语音识别器,该低成本语音识别器可用于识别训练语料库。最后,迭代地应用识别训练语料库和估计带有识别的转录的模型参数的过程。与使用72个小时的手动转录数据训练的完全调整的语音识别系统相比,两个美国广播新闻测试集的单词错误率分别仅分别提高了14.6%和16.6%。最后,提出了两种新的句子假设评分方法。这两种方法都基于单词后验概率。在仍然旨在最小化预期的句子错误数的第一种方法中,单词后验概率用于在评分算法期间替换声学和语言模型的概率。使用此方法,相对于本文中使用的五个语音语料库,单词错误率降低了1.5%至5.1%。在第二种方法中,显式最小化了预期的单词错误数,而不是预期的句子错误数。为此,使用了一个成本函数,该函数基于以下观察:单词的身份不仅可以基于Levenshtein对齐方式进行比较,而且可以基于时间点进行比较。有了这个新的成本函数,就可以得出有效的决策规则,可以非常优雅地对其进行评估,并且可以使用后验概率一词。与这个新的决策规则相比,不同测试语料库的单词错误率相对降低了2.3%至5.1%。

著录项

  • 作者

    Wessel Frank;

  • 作者单位
  • 年度 2002
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号