首页> 外文会议>2010 7th International Symposium on Chinese Spoken Language Processing >Building topic mixture language models using the document soft classification notion of topic models
【24h】

Building topic mixture language models using the document soft classification notion of topic models

机译:使用主题模型的文档软分类概念构建主题混合语言模型

获取原文

摘要

We present a topic mixture language modeling approach making use of the soft classification notion of topic models. Given a text document set, we first perform document soft classification by applying a topic modeling process such as probabilistic latent semantic analyses (PLSA) or latent Dirichlet allocation (LDA) on the dataset. Then we can derive topic-specific n-gram counts from the classified texts. Finally we build topic-specific n-gram language models (LM) from the n-gram counts using traditional n-gram modeling approach. In decoding we perform topic inference from the processing context, and we use unsupervised topic adaptation approach to combine the topic-specific models. Experimental results show that the suggested method outperforms the state-of-the-art topic-model-based unsupervised adaptation approaches.
机译:我们提出一种利用主题模型的软分类概念的主题混合语言建模方法。给定一个文本文档集,我们首先通过对数据集应用主题建模过程(例如概率潜在语义分析(PLSA)或潜在Dirichlet分配(LDA))来执行文档软分类。然后,我们可以从分类文本中得出特定于主题的n-gram计数。最后,我们使用传统的n-gram建模方法从n-gram计数中构建特定于主题的n-gram语言模型(LM)。在解码中,我们从处理上下文中执行主题推断,并且我们使用无监督的主题自适应方法来组合主题特定的模型。实验结果表明,所提出的方法优于基于最新主题模型的无监督自适应方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号