...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition
【24h】

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

机译:大词汇量连续语音识别中OOV词的语义上下文建模

获取原文
获取原文并翻译 | 示例
           

摘要

The diachronic nature of broadcast news data leads to the problem of out-of-vocabulary (OOV) words in large vocabulary continuous speech recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are proper names (PNs). However, PNs are important for automatic indexing of audio–video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable the recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from latent Dirichlet allocation (LDA) topic models, continuous word vector representations and the neural bag-of-words (NBOW) model which is capable of learning task specific word and context representations. We propose a neural bag-of-weighted words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos, we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.
机译:广播新闻数据的历时性导致在大型词汇连续语音识别(LVCSR)系统中出现语音外(OOV)单词的问题。对OOV词的分析表明,其中大多数是专有名词(PN)。但是,PN对于音频视频内容的自动索引和获得可靠的自动转录非常重要。在本文中,我们重点讨论历时音频文件中的OOV PNs问题。为了能够恢复LVCSR系统遗漏的PN,可以通过利用LVCSR转录的语义上下文来检索相关的OOV PN。为了检索OOV PN,我们探索了从潜在Dirichlet分配(LDA)主题模型,连续单词向量表示和神经词袋(NBOW)模型派生的主题和语义上下文,该模型能够学习任务特定的单词和上下文表示。我们提出了一种神经袋加权单词(NBOW2)模型,该模型学习为那些对检索OOV PN很重要的单词赋予更高的权重。通过对法国广播新闻视频进行的实验,我们表明NBOW和NBOW2模型优于基于LDA和Skip-gram模型的原始嵌入的方法。结合NBOW和NBOW2模型可以在训练过程中提供更快的收敛速度。第二遍语音识别实验(其中使用检索到的OOV PN更新LVCSR词汇和语言模型)证明了所提出的上下文模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号