首页> 外文会议>AAAI Conference on Artificial Intelligence >The Hybrid Nested/Hierarchical Dirichlet Process and Its Application to Topic Modeling with Word Differentiation
【24h】

The Hybrid Nested/Hierarchical Dirichlet Process and Its Application to Topic Modeling with Word Differentiation

机译:混合嵌套/分层Dirichlet进程及其在单词差异建模主题中的应用

获取原文

摘要

The hierarchical Dirichlet process (HDP) is a powerful nonparametric Bayesian approach to modeling groups of data which allows the mixture components in each group to be shared. However, in many cases the groups themselves are also in latent groups (categories) which may impact the modeling a lot. In order to utilize the unknown category information of grouped data, we present the hybrid nested/hierarchical Dirichlet process (hNHDP), a prior that blends the desirable aspects of both the HDP and the nested Dirichlet Process (NDP). Specifically, we introduce a clustering structure for the groups. The prior distribution for each cluster is a realization of a Dirichlet process. Moreover, the set of cluster-specific distributions can share part of atoms between groups, and the shared atoms and specific atoms are generated separately. We apply the hNHDP to document modeling and bring in a mechanism to identify discriminative words and topics. We derive an efficient Markov chain Monte Carlo scheme for posterior inference and present experiments on document modeling.
机译:分层DireChlet进程(HDP)是一种强大的非参数贝叶斯方法,可以建模数据组,该组允许共享每个组中的混合组件。然而,在许多情况下,集团本身也潜伏组(类别),这可能会影响造型很多。为了利用分组数据的未知类别信息,我们介绍了混合嵌套/分层Dirichlet进程(HNHDP),该方法将HDP和嵌套Dirichlet过程(NDP)的所需方面混合。具体地,我们为组介绍了群集结构。每个群集的先前分配是实现Dirichlet过程的实现。此外,该组群特异性分布可以共享组之间的原子的一部分,并且所述共享原子和具体原子分别生成。我们将HNHDP应用于记录建模并带来一种识别歧视性词语和主题的机制。我们推出了一种高效的马尔可夫链蒙特卡罗来说,用于后部推理和本实验对文献建模。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号