首页> 外文学位 >Probabilistic topic models for information retrieval and concept modeling.
【24h】

Probabilistic topic models for information retrieval and concept modeling.

机译:用于信息检索和概念建模的概率主题模型。

获取原文
获取原文并翻译 | 示例

摘要

Statistical topic models are a class of probabilistic latent variable models for textual data that represent text documents as distributions over topics. These models have been shown to produce interpretable summarization of documents in the form of topics. In this dissertation, we investigate how the statistical topic modeling framework can be used for information retrieval tasks and for the integration of background knowledge in the form of semantic concepts. We first describe the special-words topic models in which a document is represented as a distribution of (i) a mixture of shared topics, (ii) a special-words distribution specific to the document, and (iii) a corpus-level background distribution. We describe the utility of the special-words topic models for information retrieval tasks and illustrate a variation of the model for metadata enhancement of digital libraries with multiple corpora. We next investigate the problem of integrating background knowledge in the form of semantic concepts into the topic modeling framework. To combine data-driven topics and semantic concepts, we propose the concept-topic model which represents a document as a distribution over data-driven topics and semantic concepts. We extend this model to the hierarchical concept-topic model to incorporate concept hierarchies into the modeling framework. For all these models, we develop learning algorithms and demonstrate their utility with experiments conducted on real-world data sets.
机译:统计主题模型是一类针对文本数据的概率潜在变量模型,这些文本模型将文本文档表示为主题上的分布。这些模型已经显示出可以以主题的形式产生可解释的文档摘要。在本文中,我们研究了统计主题建模框架如何用于信息检索任务以及以语义概念的形式用于背景知识的集成。我们首先描述特殊词主题模型,其中文档表示为(i)共享主题的混合,(ii)特定于文档的特殊词分布和(iii)语料库级背景的分布分配。我们描述了专用词主题模型用于信息检索任务的实用性,并说明了用于具有多个语料库的数字图书馆的元数据增强模型的变体。接下来,我们研究将背景知识以语义概念的形式集成到主题建模框架中的问题。为了结合数据驱动主题和语义概念,我们提出了概念主题模型,该模型将文档表示为数据驱动主题和语义概念的分布。我们将此模型扩展到层次概念主题模型,以将概念层次结构合并到建模框架中。对于所有这些模型,我们都会开发学习算法,并通过对真实数据集进行的实验来证明其实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号