【24h】

Hierarchical Theme and Topic Modeling

机译:分层主题和主题建模

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Considering the hierarchical data groupings in text corpus, e.g., words, sentences, and documents, we conduct the structural learning and infer the latent themes and topics for sentences and words from a collection of documents, respectively. The relation between themes and topics under different data groupings is explored through an unsupervised procedure without limiting the number of clusters. A tree stick-breaking process is presented to draw theme proportions for different sentences. We build a hierarchical theme and topic model, which flexibly represents the heterogeneous documents using Bayesian nonparametrics. Thematic sentences and topical words are extracted. In the experiments, the proposed method is evaluated to be effective to build semantic tree structure for sentences and the corresponding words. The superiority of using tree model for selection of expressive sentences for document summarization is illustrated.
机译:考虑到文本语料库中的分层数据分组,例如单词,句子和文档,我们进行了结构学习,并分别从文档集合中推断出句子和单词的潜在主题和主题。通过无监督程序探索不同数据分组下主题与主题之间的关系,而不会限制聚类的数量。提出了一种折断树的过程来绘制不同句子的主题比例。我们构建了一个分层的主题和主题模型,该模型使用贝叶斯非参数灵活地表示异构文档。提取主题句子和主题词。在实验中,对所提出的方法进行了评估,以有效构建句子和相应单词的语义树结构。说明了使用树模型选择表达摘要进行文档摘要的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号