首页> 外文期刊>Journal of Zhejiang University. Science, A >Hierarchical topic modeling with nested hierarchical Dirichlet process
【24h】

Hierarchical topic modeling with nested hierarchical Dirichlet process

机译:使用嵌套分层Dirichlet进程建模的分层主题

获取原文
           

摘要

This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonparametric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic relationships compared to the hierarchical latent Dirichlet allocation model.
机译:本文涉及文本语料库中潜在主题层次结构的统计建模。主题树的高度被假定为固定,而每个级别的主题数量为未知的先验,并且从数据推断出来。采取非参数贝叶斯方法对此问题,我们提出了一种基于嵌套分层DireChlet过程(NHDP)的新的概率生成模型,并提出了Markov链蒙特卡罗采样算法,用于推断主题树结构以及单词分布每个文档的每个主题和主题分发。我们的理论分析和实验结果表明,与分层潜在Direichlet分配模型相比,该模型可以产生更紧凑的分层主题结构,并捕获更细粒度的主题关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号