Document Clustering via Dirichlet Process Mixture Model with Feature Selection

机译：通过具有特征选择的Dirichlet过程混合模型进行文档聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One essential issue of document clustering is to estimate the appropriate number of clusters for a document collection to which documents should be partitioned. In this paper, we propose a novel approach, namely DPMFS, to address this issue. The proposed approach is designed 1) to group documents into a set of clusters while the number of document clusters is determined by the Dirichlet process mixture model automatically; 2) to identify the discriminative words and separate them from irrelevant noise words via stochastic search variable selection technique. We explore the performance of our proposed approach on both a synthetic dataset and several realistic document datasets. The comparison between our proposed approach and stage-of-the-art document clustering approaches indicates that our approach is robust and effective for document clustering.

机译：文档聚类的一个基本问题是为文档集合估计适当的聚类数，应将文档分区到该聚类中。在本文中，我们提出了一种新颖的方法，即DPMFS，来解决此问题。设计所提出的方法是：1）将文档分组为一组簇，而文档簇的数量由Dirichlet过程混合模型自动确定; 2）通过随机搜索变量选择技术来识别有区别的单词，并将它们与无关的噪音单词分开。我们在合成数据集和一些实际文档数据集上探索了我们提出的方法的性能。我们提出的方法与最先进的文档聚类方法之间的比较表明，我们的方法对于文档聚类是可靠且有效的。

著录项

来源
《ACM SIGKDD international conference on knowledge discovery and data mining;KDD 10》|2011年|p.763-771|共9页
会议地点
作者
Guan Yu; Ruizhang Huang; Zhaojun Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
Document Clustering; Dirichlet Process Mixture Model; Feature Selection.;

机译：文档聚类; Dirichlet过程混合模型;功能选择。;

相似文献

外文文献
中文文献
专利

1. Dirichlet Process Mixture Model for Document Clustering with Feature Partition [J] . Huang Ruizhang, Yu Guan, Wang Zhaojun, IEEE Transactions on Knowledge and Data Engineering . 2013,第8期

机译：具有特征分区的文档聚类Dirichlet过程混合模型
2. Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted Dirichlet mixture models [J] . Mohamed Al Mashrgy, Taoufik Bdiri, Nizar Bouguila Knowledge-Based Systems . 2014,第MARa期

机译：使用广义反向Dirichlet混合模型进行鲁棒的同时正向数据聚类和无监督特征选择
3. Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models [J] . Tarek Elguebaly, Nizar Bouguila Signal processing . 2013,第6期

机译：使用基于RJMCMC的有限广义Dirichlet混合模型学习同时进行贝叶斯聚类和特征选择
4. Document Clustering via Dirichlet Process Mixture Model with Feature Selection [C] . Guan Yu, Ruizhang Huang, Zhaojun Wang ACM SIGKDD international conference on knowledge discovery and data mining . 2010

机译：通过具有特征选择的Dirichlet Process混合模型的文档群集
5. Dirichlet process mixture modeling: Hidden Markov mixture models and multi-task compressive sensing [D] . Qi, Yuting 2009

机译：Dirichlet过程混合物建模：隐马尔可夫混合物模型和多任务压缩感测
6. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures [O] . Yun Chen, Hui Yang -1

机译：一种新的信息理论方法用于使用狄利克雷过程混合变量进行聚类和预测建模
7. Document Clustering via Dirichlet Process Mixture Model with Feature Selection [O] . Guan Yu, Ruizhang Huang, Zhaojun Wang 2015

机译：基于Dirichlet过程混合模型的特征选择文档聚类

Document Clustering via Dirichlet Process Mixture Model with Feature Selection

摘要

著录项

相似文献

相关主题

期刊订阅