...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Dirichlet Process Mixture Model for Document Clustering with Feature Partition
【24h】

Dirichlet Process Mixture Model for Document Clustering with Feature Partition

机译:具有特征分区的文档聚类Dirichlet过程混合模型

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Finding the appropriate number of clusters to which documents should be partitioned is crucial in document clustering. In this paper, we propose a novel approach, namely DPMFP, to discover the latent cluster structure based on the DPM model without requiring the number of clusters as input. Document features are automatically partitioned into two groups, in particular, discriminative words and nondiscriminative words, and contribute differently to document clustering. A variational inference algorithm is investigated to infer the document collection structure as well as the partition of document words at the same time. Our experiments indicate that our proposed approach performs well on the synthetic data set as well as real data sets. The comparison between our approach and state-of-the-art document clustering approaches shows that our approach is robust and effective for document clustering.
机译:在文档群集中,找到应将文档分区到的适当数量的群集至关重要。在本文中,我们提出了一种新颖的方法,即DPMFP,它基于DPM模型发现潜在的聚类结构,而无需输入聚类数。文档特征会自动分为两类,特别是区分性词和非区分性词,它们对文档聚类的贡献也不同。研究了一种变分推理算法,以同时推断文档收集结构以及文档单词的划分。我们的实验表明,我们提出的方法在合成数据集和实际数据集上均表现良好。我们的方法与最新的文档聚类方法之间的比较表明,我们的方法对于文档聚类是强大且有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号