Building topic mixture language models using the document soft classification notion of topic models

机译：使用主题模型的文档软分类概念构建主题混合语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a topic mixture language modeling approach making use of the soft classification notion of topic models. Given a text document set, we first perform document soft classification by applying a topic modeling process such as probabilistic latent semantic analyses (PLSA) or latent Dirichlet allocation (LDA) on the dataset. Then we can derive topic-specific n-gram counts from the classified texts. Finally we build topic-specific n-gram language models (LM) from the n-gram counts using traditional n-gram modeling approach. In decoding we perform topic inference from the processing context, and we use unsupervised topic adaptation approach to combine the topic-specific models. Experimental results show that the suggested method outperforms the state-of-the-art topic-model-based unsupervised adaptation approaches.

机译：我们提出一种利用主题模型的软分类概念的主题混合语言建模方法。给定一个文本文档集，我们首先通过对数据集应用主题建模过程（例如概率潜在语义分析（PLSA）或潜在Dirichlet分配（LDA））来执行文档软分类。然后，我们可以从分类文本中得出特定于主题的n-gram计数。最后，我们使用传统的n-gram建模方法从n-gram计数中构建特定于主题的n-gram语言模型（LM）。在解码中，我们从处理上下文中执行主题推断，并且我们使用无监督的主题自适应方法来组合主题特定的模型。实验结果表明，所提出的方法优于基于最新主题模型的无监督自适应方法。

著录项

来源
《2010 7th International Symposium on Chinese Spoken Language Processing》|2010年|p.229-232|共4页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
language model; topic mixture language model (TMLM); unsupervised adaptation;

机译：语言模型;主题混合语言模型（TMLM）;无监督适应;

相似文献

外文文献
中文文献
专利

1. Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification [J] . Malek Mouhoub, Mustakim Al Helal Computer and Information Science . 2018,第4期

机译：孟加拉语言中的主题建模：优化主题和新闻分类的LDA方法
2. Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification [J] . Mustakim Al Helal, Malek Mouhoub Computer and information science . 2018,第4期

机译：孟加拉语中的主题建模：优化主题和新闻分类的LDA方法
3. Nodes of Topicality: Modeling User Notions of On Topic Documents [J] . Howard Greisdorf, Brian OConnor Journal of the American Society for Information Science and Technology . 2003,第14期

机译：主题性的节点：主题文档的用户概念建模
4. Building topic mixture language models using the document soft classification notion of topic models [C] . {missing} International Symposium on Chinese Spoken Language Processing . 2010

机译：建立主题混合语言模型使用主题模型的文档软分类概念
5. Connecting Documents, Words, and Languages Using Topic Models [D] . Yang, Weiwei. 2019

机译：使用主题模型连接文档，单词和语言
6. Incorporating Statistical Topic Models in the Retrieval of Healthcare Documents [O] . Karla Caballero, Ram Akella 2015

机译：在医疗文档检索中纳入统计主题模型
7. Enhancing document modeling by means of open topic models Crossing the frontier of classification schemes in digital libraries by example of the DDC [O] . Mehler Alexander, Waltinger Ulli 2009

机译：通过开放主题模型增强文档建模以DDC为例，跨越数字图书馆中分类方案的前沿
8. Text Classification of installation Support Contract Topic Models for Category Management. [R] . Sevier, W. C. 2018

机译：文本分类安装支持合同主题模型的类别管理。

Building topic mixture language models using the document soft classification notion of topic models

摘要

著录项

相似文献

相关主题

期刊订阅