首页> 外文OA文献 >A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models
【2h】

A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models

机译:基于分层潜在树模型的主题检测的新文档生成过程

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In most probabilistic topic models, a document is viewed as a collection oftokens and each token is a variable whose values are all the words in avocabulary. One exception is hierarchical latent tree models (HLTMs), where adocument is viewed as a binary vector over the vocabulary and each word isregarded as a binary variable. The use of word variables allows the detectionand representation of patterns of word co-occurrences and co-occurrences ofthose patterns qualitatively using multiple levels of latent variables, andnaturally leads to a method for hierarchical topic detection. In this paper, weassume that an HLTM has been learned from binary data and we extend it to takeword frequencies into consideration. The idea is to replace each binary wordvariable with a real-valued variable that represents the relative frequency ofthe word in a document. A document generation process is proposed and analgorithm is given for estimating the model parameters by inverting thegeneration process. Empirical results show that our method significantlyoutperforms the commonly-used LDA-based methods for hierarchical topicdetection, in terms of model quality and meaningfulness of topics and topichierarchies.
机译:在大多数概率主题模型中,文档被视为一个集合,每个令牌都是一个变量,其值是Avoculary中的所有单词。一个例外是分层潜在树模型(HLTMS),其中Adocument被视为词汇表的二进制向量,并且每个单词都被视为二进制变量。单词变量的使用允许定性地使用多个级别的潜变量来检测单词共同发生模式和组件模式,并使用多个级别的潜变量,并对分层主题检测的方法进行定性。在本文中,Weassume从二进制数据中学习了HLTM,我们将其扩展为抛弃码频率考虑。这个想法是用一个实际值变量替换每个二进制字变量,该变量表示文档中的单词的相对频率。提出了一种文档生成过程,并且通过反转成本过程来提供用于估计模型参数的分析。实证结果表明,在模型质量和主题和拓扑结构的有意义方面,我们的方法明显地表明了基于常用的基于LDA的方法进行了分层主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号