首页> 外文会议>2012 IEEE Workshop on Spoken Language Technology. >Topic n-gram count language model adaptation for speech recognition
【24h】

Topic n-gram count language model adaptation for speech recognition

机译:用于语音识别的主题n-gram计数语言模型适应

获取原文
获取原文并翻译 | 示例

摘要

We introduce novel language model (LM) adaptation approaches using the latent Dirichlet allocation (LDA) model. Observed n-grams in the training set are assigned to topics using soft and hard clustering. In soft clustering, each n-gram is assigned to topics such that the total count of that n-gram for all topics is equal to the global count of that n-gram in the training set. Here, the normalized topic weights of the n-gram are multiplied by the global n-gram count to form the topic n-gram count for the respective topics. In hard clustering, each n-gram is assigned to a single topic with the maximum fraction of the global n-gram count for the corresponding topic. Here, the topic is selected using the maximum topic weight for the n-gram. The topic n-gram count LMs are created using the respective topic n-gram counts and adapted by using the topic weights of a development test set. We compute the average of the confidence measures: the probability of word given topic and the probability of topic given word. The average is taken over the words in the n-grams and the development test set to form the topic weights of the n-grams and the development test set respectively. Our approaches show better performance over some traditional approaches using the WSJ corpus.
机译:我们介绍使用潜在狄利克雷分配(LDA)模型的新型语言模型(LM)适应方法。使用软聚类和硬聚类将训练集中观察到的n-gram分配给主题。在软聚类中,将每个n-gram分配给主题,以使所有主题的该n-gram的总计数等于训练集中该n-gram的全局计数。在此,将n-gram的标准化主题权重乘以全局n-gram计数,以形成各个主题的主题n-gram计数。在硬聚类中,每个n-gram被分配给单个主题,并具有对应主题的全局n-gram计数的最大分数。在此,使用n-gram的最大主题权重选择主题。使用各自的主题n-gram计数创建主题n-gram计数LM,并使用开发测试集的主题权重进行调整。我们计算置信度度量的平均值:给定单词的单词的概率和给定单词的主题的概率。取n-gram和开发测试集中的单词的平均值,分别形成n-gram和开发测试集中的主题权重。与使用WSJ语料库的某些传统方法相比,我们的方法显示出更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号