首页> 外文会议>IEEE Workshop on Spoken Language Technology >Topic n-gram count language model adaptation for speech recognition
【24h】

Topic n-gram count language model adaptation for speech recognition

机译:主题n-gram数语言模型适应语音识别

获取原文

摘要

We introduce novel language model (LM) adaptation approaches using the latent Dirichlet allocation (LDA) model. Observed n-grams in the training set are assigned to topics using soft and hard clustering. In soft clustering, each n-gram is assigned to topics such that the total count of that n-gram for all topics is equal to the global count of that n-gram in the training set. Here, the normalized topic weights of the n-gram are multiplied by the global n-gram count to form the topic n-gram count for the respective topics. In hard clustering, each n-gram is assigned to a single topic with the maximum fraction of the global n-gram count for the corresponding topic. Here, the topic is selected using the maximum topic weight for the n-gram. The topic n-gram count LMs are created using the respective topic n-gram counts and adapted by using the topic weights of a development test set. We compute the average of the confidence measures: the probability of word given topic and the probability of topic given word. The average is taken over the words in the n-grams and the development test set to form the topic weights of the n-grams and the development test set respectively. Our approaches show better performance over some traditional approaches using the WSJ corpus.
机译:我们使用潜在的Dirichlet分配(LDA)模型引入新颖的语言模型(LM)适应方法。观察到训练集中的n-grams被分配给使用软和硬群集的主题。在软群中,将每个N-GRAM分配给主题,使得所有主题的N-GRAM的总数等于训练集中的n-gram的全局计数。这里,n-gram的标准化主题权重乘以全局n-gram计数,以形成各个主题的主题n-gram计数。在硬群中,每个n-gram被分配给单个主题,其中相应主题的全局n-gram计数的最大分数。在这里,使用n-gram的最大主题权重选择该主题。使用相应主题n-gram计数创建主题n-gram计数LMS,并通过使用开发测试集的主题权重进行调整。我们计算了置信度措施的平均值:给定主题的词语和主题的概率给出了Word。平均值在n-grams中的单词和开发测试集中拍摄,以分别形成n-grams和开发测试集的主题权重。我们的方法在使用WSJ语料库的某些传统方法上表现出更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号