首页> 外文会议>Annual meeting of the Association for Computational Linguistics;ACL 2011 >Learning Sub-Word Units for Open Vocabulary Speech Recognition
【24h】

Learning Sub-Word Units for Open Vocabulary Speech Recognition

机译:学习子词单元以进行开放式词汇语音识别

获取原文

摘要

Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (absolute) at a 5% false alarm rate on an English Broadcast News and MIT Lectures task respectively.
机译:大型词汇语音识别系统无法识别超出词汇量的单词,其中许多单词都是信息丰富的术语,例如命名实体或外来词。混合词/子词系统通过将子词单元添加到基于大型词汇的系统中来解决此问题;然后可以通过子词单元的组合来表示新词。以前的工作使用简单的统计数据来选择常见的电话序列,从文本的语音表示中试探性地创建了子词词典。我们提出一个概率模型来学习针对给定任务优化的子词词典。我们考虑了单词外词汇(OOV)单词检测的任务,该任务依赖于混合模型的输出。在我们的英语广播新闻和MIT讲座任务中,具有我们所学的子词词典的混合模型可将错误率分别降低6.3%和7.6%(绝对值),错误率达到5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号