首页> 外文会议>Conference on empirical methods in natural language processing >TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering
【24h】

TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering

机译:TSDPMM:将先前的主题知识整合到用于文本聚类的Dirichlet过程混合模型中

获取原文

摘要

Dirichlet process mixture model (DPM-M) has great potential for detecting the underlying structure of data. Extensive studies have applied it for text clustering in terms of topics. However, due to the unsupervised nature, the topic cluster-s are always less satisfactory. Considering that people often have some prior knowledge about which potential topics should exist in given data, we aim to incorporate such knowledge into the DPMM to improve text clustering. We propose a novel model TSDPMM based on a new seeded Polya urn scheme. Experimental results on document clustering across three datasets demonstrate our proposed TSDPMM significantly outperforms state-of-the-art DPMM model and can be applied in a lifelong learning framework.
机译:Dirichlet过程混合模型(DPM-M)在检测数据的基础结构方面具有巨大的潜力。广泛的研究已将其用于主题方面的文本聚类。但是,由于不受监督的性质,主题簇总是不太令人满意。考虑到人们通常对给定数据中应该存在哪些潜在主题有一些先验知识,我们的目标是将此类知识合并到DPMM中以改善文本聚类。我们提出了一种基于新的播种Polya urn方案的新颖模型TSDPMM。关于跨三个数据集的文档聚类的实验结果表明,我们提出的TSDPMM明显优于最新的DPMM模型,可用于终身学习框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号