Learning Sub-Word Units for Open Vocabulary Speech Recognition

机译：学习子词单元以进行开放式词汇语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (absolute) at a 5% false alarm rate on an English Broadcast News and MIT Lectures task respectively.

机译：大型词汇语音识别系统无法识别超出词汇量的单词，其中许多单词都是信息丰富的术语，例如命名实体或外来词。混合词/子词系统通过将子词单元添加到基于大型词汇的系统中来解决此问题;然后可以通过子词单元的组合来表示新词。以前的工作使用简单的统计数据来选择常见的电话序列，从文本的语音表示中试探性地创建了子词词典。我们提出一个概率模型来学习针对给定任务优化的子词词典。我们考虑了单词外词汇（OOV）单词检测的任务，该任务依赖于混合模型的输出。在我们的英语广播新闻和MIT讲座任务中，具有我们所学的子词词典的混合模型可将错误率分别降低6.3％和7.6％（绝对值），错误率达到5％。

著录项

来源
《Annual meeting of the Association for Computational Linguistics;ACL 2011》|2012年|p.712-721|共10页
会议地点
作者
Carolina Parada; Mark Dredze; Abhinav Sethy; Ariya Rastrow;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Korean large vocabulary continuous speech recognition with morpheme-based recognition units [J] . Oh-Wook Kwon, Jun Park Speech Communication . 2003,第3a4期

机译：具有基于词素的识别单元的韩语大词汇量连续语音识别
2. Decoding with sub-word network models for out-of-vocabulary words recognition [J] . Hiroaki Kokubo, Shigehiko Onishi, Hirofumi Yamamoto, 電子情報通信学会技術研究報告. 音声. Speech . 2001,第156期

机译：利用子词网络模型进行解码，以识别词汇外的词
3. Decoding with sub-word network models for out-of-vocabulary words recognition [J] . Hiroaki Kokubo, Shigehiko Onishi, Hirofumi Yamamoto, 電子情報通信学会技術研究報告. 音声. Speech . 2001,第156期

机译：用子字网络模型进行解码，用于失控单词识别
4. Learning Sub-Word Units for Open Vocabulary Speech Recognition [C] . Carolina Parada, Mark Dredze, Abhinav Sethy, Annual meeting of the Association for Computational Linguistics . 2011

机译：学习开放词汇语音识别的子字单元
5. Learning sub-word units and exploiting contextual information for open vocabulary speech recognition. [D] . Parada, Maria Carolina. 2011

机译：学习子词单位并利用上下文信息进行开放式词汇语音识别。
6. Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition [O] . Jibin Wu, Emre Yılmaz, Malu Zhang, 2020

机译：大型词汇自动语音识别深尖峰神经网络
7. Speech Recognition Using Sub-Word Units Dependent On Phonetic Contexts Of Both Training And Recognition Vocabularies [O] . Hiroaki Hattori And, Hiroaki Hattori, Eiko Yamada 2007

机译：使用依赖于训练和识别词汇的语音上下文的子词单位进行语音识别

Learning Sub-Word Units for Open Vocabulary Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅