首页> 外文会议>Conference on empirical methods in natural language processing >TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering
【24h】

TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering

机译:TSDPMM:将先前主题知识合并到Dirichlet过程混合模型中用于文本聚类

获取原文

摘要

Dirichlet process mixture model (DPM-M) has great potential for detecting the underlying structure of data. Extensive studies have applied it for text clustering in terms of topics. However, due to the unsupervised nature, the topic cluster-s are always less satisfactory. Considering that people often have some prior knowledge about which potential topics should exist in given data, we aim to incorporate such knowledge into the DPMM to improve text clustering. We propose a novel model TSDPMM based on a new seeded Polya urn scheme. Experimental results on document clustering across three datasets demonstrate our proposed TSDPMM significantly outperforms state-of-the-art DPMM model and can be applied in a lifelong learning framework.
机译:Dirichlet Process混合物模型(DPM-M)具有检测数据的底层结构的潜力。广泛的研究已在主题方面应用于文本聚类。但是,由于性质无监督,主题群集总是令人满意。考虑到人们经常有一些先验的知识,关于哪些潜在主题应该存在于给定数据中,我们的目标是将这些知识纳入DPMM以改善文本聚类。我们提出了一种基于新种子Polya Urn计划的新型模型Tsdpmm。三个数据集的文档聚类的实验结果证明了我们所提出的TSDPMM显着优于最先进的DPMM模型,可应用于终身学习框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号