Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Imran Sheikh; Dominique Fohr; Irina Illina; Georges Linarès

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

【24h】

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

机译：大词汇量连续语音识别中OOV词的语义上下文建模

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The diachronic nature of broadcast news data leads to the problem of out-of-vocabulary (OOV) words in large vocabulary continuous speech recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are proper names (PNs). However, PNs are important for automatic indexing of audio–video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable the recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from latent Dirichlet allocation (LDA) topic models, continuous word vector representations and the neural bag-of-words (NBOW) model which is capable of learning task specific word and context representations. We propose a neural bag-of-weighted words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos, we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.

机译：广播新闻数据的历时性导致在大型词汇连续语音识别（LVCSR）系统中出现语音外（OOV）单词的问题。对OOV词的分析表明，其中大多数是专有名词（PN）。但是，PN对于音频视频内容的自动索引和获得可靠的自动转录非常重要。在本文中，我们重点讨论历时音频文件中的OOV PNs问题。为了能够恢复LVCSR系统遗漏的PN，可以通过利用LVCSR转录的语义上下文来检索相关的OOV PN。为了检索OOV PN，我们探索了从潜在Dirichlet分配（LDA）主题模型，连续单词向量表示和神经词袋（NBOW）模型派生的主题和语义上下文，该模型能够学习任务特定的单词和上下文表示。我们提出了一种神经袋加权单词（NBOW2）模型，该模型学习为那些对检索OOV PN很重要的单词赋予更高的权重。通过对法国广播新闻视频进行的实验，我们表明NBOW和NBOW2模型优于基于LDA和Skip-gram模型的原始嵌入的方法。结合NBOW和NBOW2模型可以在训练过程中提供更快的收敛速度。第二遍语音识别实验（其中使用检索到的OOV PN更新LVCSR词汇和语言模型）证明了所提出的上下文模型的有效性。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2017年第3期|598-610|共13页
作者
Imran Sheikh; Dominique Fohr; Irina Illina; Georges Linarès;
展开▼
作者单位

Multispeech (Inria/CNRS/Université de Lorraine) Project-Team at LORIA, UMR 7503, Vandoeuvre-lès-Nancy, France;

Multispeech (Inria/CNRS/Université de Lorraine) Project-Team at LORIA, UMR 7503, Vandoeuvre-lès-Nancy, France;

Multispeech (Inria/CNRS/Université de Lorraine) Project-Team at LORIA, UMR 7503, Vandoeuvre-lès-Nancy, France;

Laboratoire d'Informatique d'Avignon, University of Avignon, Avignon, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Context; Vocabulary; Context modeling; Speech recognition; Semantics; Training; Computational modeling;

机译：语境;词汇;语境建模;语音识别;语义学;训练;计算建模;

相似文献

外文文献
中文文献
专利

1. An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition [J] . Bert Reveil, Kris Demuynck, Jean-Pierre Martens Computer speech and language . 2014,第1期

机译：一种改进的两阶段混合语言模型方法，用于处理大词汇量连续语音识别中的词汇外单词
2. Multonic Markov word models for large vocabulary continuous speech recognition [J] . Bahl L.R., Bellegarda J.R. IEEE Transactions on Speech and Audio Proceeding . 1993,第3期

机译：用于大词汇量连续语音识别的Multonic Markov单词模型
3. Effect of Vocabulary Extension using Word Sequence Concatenation for Large Vocabulary Continuous Speech Recognition [J] . YOSUKE WADA, NORIHIKO KOBAYASHI, YUICHIRO NAKANO 情報処理学会論文誌 . 1999,第4期

机译：单词序列级联对词汇扩展对大词汇量连续语音识别的影响
4. Using sub-word n-gram models for dealing with OOV in large vocabulary speech recognition for Latvian [C] . Askars Salimbajevs, Jevgenijs Strigins 20th Nordic Conference of Computational Linguistics . 2015

机译：使用子词n元语法模型处理拉脱维亚语的大词汇量语音识别中的OOV
5. Modeling lexical tones for Mandarin large vocabulary continuous speech recognition. [D] . Lei, Xin. 2006

机译：为普通话大词汇量连续语音识别建模词汇声调。
6. Sublexical Properties of Spoken Words Modulate Activity in Broca’s Area but Not Superior Temporal Cortex: Implications for Models of Speech Recognition [O] . Kenneth I. Vaden Jr., Tepring Piquado, Gregory Hickok -1

机译：在布罗卡区而不是高级颞叶皮质口语词调节活动的形旁亚词汇性质：对语音识别的模式
7. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition [O] . Imran Sheikh, Dominique Fohr, Irina Illina, 2017

机译：大型词汇连续语音识别中OOV单词的语义背景建模

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅