首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
【24h】

Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection

机译:词汇外语音检测的随机语音建模

获取原文
获取原文并翻译 | 示例

摘要

Spoken term detection (STD) is the name given to the task of searching large amounts of audio for occurrences of spoken terms, which are typically single words or short phrases. One reason that STD is a hard task is that search terms tend to contain a disproportionate number of out-of-vocabulary (OOV) words. The most common approach to STD uses subword units. This, in conjunction with some method for predicting pronunciations of OOVs from their written form, enables the detection of OOV terms but performance is considerably worse than for in-vocabulary terms. This performance differential can be largely attributed to the special properties of OOVs. One such property is the high degree of uncertainty in the pronunciation of OOVs. We present a stochastic pronunciation model (SPM) which explicitly deals with this uncertainty. The key insight is to search for all possible pronunciations when detecting an OOV term, explicitly capturing the uncertainty in pronunciation. This requires a probabilistic model of pronunciation, able to estimate a distribution over all possible pronunciations. We use a joint-multigram model (JMM) for this and compare the JMM-based SPM with the conventional soft match approach. Experiments using speech from the meetings domain demonstrate that the SPM performs better than soft match in most operating regions, especially at low false alarm probabilities. Furthermore, SPM and soft match are found to be complementary: their combination provides further performance gains.
机译:语音术语检测(STD)是为在大量音频中搜索语音术语出现而执行的任务的名称,通常是单个单词或简短短语。 STD是一项艰巨任务的原因之一是搜索字词往往包含不成比例的词汇量(OOV)单词。 STD的最常见方法是使用子字单元。这与从其书面形式预测OOV发音的某种方法一起,可以检测OOV术语,但性能要比词汇中的术语差得多。这种性能差异在很大程度上可以归因于OOV的特殊属性。一种这样的特性是OOV的发音高度不确定。我们提出了一种随机发音模型(SPM),可以明确处理这种不确定性。关键的见解是在检测到OOV术语时搜索所有可能的发音,从而明确捕获发音中的不确定性。这需要发音的概率模型,该模型能够估计所有可能发音的分布。为此,我们使用联合多模型模型(JMM),并将基于JMM的SPM与常规软匹配方法进行比较。使用会议领域语音的实验表明,在大多数操作区域中,SPM的性能均优于软匹配,尤其是在误报率较低的情况下。此外,发现SPM和软匹配是互补的:它们的组合可进一步提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号