首页> 外文期刊>Artificial intelligence >Latent tree models for hierarchical topic detection
【24h】

Latent tree models for hierarchical topic detection

机译:潜在树模型用于分层主题检测

获取原文
获取原文并翻译 | 示例
       

摘要

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies.
机译:我们提出了一种用于分层主题检测的新颖方法,其中通过以多种方式对文档进行聚类来获取主题。具体来说,我们使用称为分层潜伏树模型(HLTM)的一类图形模型对文档集合进行建模。 HLTM最底层的变量是观察到的二进制变量,代表文档中单词的存在/不存在。其他级别的变量是二进制潜在变量,它们表示单词共现模式或此类模式的共现。每个潜在变量都给文档提供了一个软分区,并且分区中的文档簇被解释为主题。层次结构较高级别的潜在变量捕获了远程单词共现模式,因此在主题上提供了更广泛的主题,而层次结构较低级别的潜在变量则捕获了短期单词共现模式,并在主题上给出了更具体的主题。与基于LDA的方法相比,新方法的主要优势在于它使用模型结构显式表示共现模式。大量的经验结果表明,在模型质量以及主题和主题层次结构的意义上,新方法明显优于基于LDA的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号