首页> 外文会议>2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. >Expansion of training texts to generate a topic-dependent language model for meeting speech recognition
【24h】

Expansion of training texts to generate a topic-dependent language model for meeting speech recognition

机译:扩展培训文本以生成与主题相关的语言模型以实现语音识别

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes expansion methods for training texts (baseline) to generate a topic-dependent language model for more accurate recognition of meeting speech. To prepare a universal language model that can cope with the variety of topics discussed in meetings is very difficult. Our strategy is to generate topic-dependent training texts based on two methods. The first is text collection from web pages using queries that consist of topic-dependent confident terms; these terms were selected from preparatory recognition results based on the TF-IDF (TF; Term Frequency, IDF; Inversed Document Frequency) values of each term. The second technique is text generation using participants' names. Our topic-dependent language model was generated using these new texts and the baseline corpus. The language model generated by the proposed strategy reduced the perplexity by 16.4% and out-of-vocabulary rate by 37.5%, respectively, compared with the language model that used only the baseline corpus. This improvement was confirmed through meeting speech recognition as well.
机译:本文提出了用于训练文本(基线)的扩展方法,以生成与主题相关的语言模型,以更准确地识别会议语音。准备一个通用的语言模型以应对会议中讨论的各种主题非常困难。我们的策略是基于两种方法生成与主题相关的培训文本。首先是使用包含主题相关的置信术语的查询从网页中收集文本;这些术语是根据每个术语的TF-IDF(TF;术语频率,IDF;反向文档频率)值从预备识别结果中选择的。第二种技术是使用参与者的姓名生成文本。我们使用这些新文本和基准语料库生成了与主题相关的语言模型。与仅使用基准语料库的语言模型相比,该策略所生成的语言模型分别使困惑度降低了16.4%,词汇率降低了37.5%。通过会议语音识别也证实了这种改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号