Short text clustering based on Pitman-Yor process mixture model

Qiang Jipeng; Li Yun; Yuan Yunhao; Wu Xindong

首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >Short text clustering based on Pitman-Yor process mixture model

【24h】

Short text clustering based on Pitman-Yor process mixture model

机译：基于Pitman-Yor Process混合模型的短文本聚类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

For finding the appropriate number of clusters in short text clustering, models based on Dirichlet Multinomial Mixture (DMM) require the maximum possible cluster number before inferring the real number of clusters. However, it is difficult to choose a proper number as we do not know the true number of clusters in short texts beforehand. The cluster distribution in DMM based on Dirichlet process as prior goes down exponentially as the number of clusters increases. Therefore, we propose a novel model based on Pitman-Yor Process to capture the power-law phenomenon of the cluster distribution in the paper. Specifically, each text chooses one of the active clusters or a new cluster with probabilities derived from the Pitman-Yor Process Mixture model (PYPM). Discriminative words and nondiscriminative words are identified automatically to help enhance text clustering. Parameters are estimated efficiently by collapsed Gibbs sampling and experimental results show PYPM is robust and effective comparing with the state-of-the-art models.

机译：为了在短文本聚类中找到适当数量的群集，基于Dirichlet多项式混合物（DMM）的模型需要最大可能的簇数在推断出真实的簇之前。但是，很难选择一个适当的数字，因为我们事先不知道短文本中的群集数量。基于Dirichlet进程的DMM中的群集分布随着簇的数量增加而下降。因此，我们提出了一种基于Pitman-Yor过程的新型模型，以捕获纸张中集群分布的幂律现象。具体地，每个文本选择一个活动簇或新簇之一，其中概率来自Pitman-yor过程混合模型（PYPM）。自动识别辨别单词和非歧视词以帮助增强文本聚类。通过折叠的GIBBS采样和实验结果表明，与最先进的模型相比，PYPM估计参数估计和实验结果表明，与最先进的模型相比，PYPM具有稳健性和有效的比较。

著录项

来源
《Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies》 |2018年第7期|共11页
作者
Qiang Jipeng; Li Yun; Yuan Yunhao; Wu Xindong;
展开▼
作者单位

Yangzhou Univ Dept Comp Sci Yangzhou Jiangsu Peoples R China;

Yangzhou Univ Dept Comp Sci Yangzhou Jiangsu Peoples R China;

Yangzhou Univ Dept Comp Sci Yangzhou Jiangsu Peoples R China;

Hefei Univ Technol Dept Comp Sci Hefei Anhui Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
LDA; Pitman-Yor process; Short text clustering;

机译：LDA;Pitman-Yor过程;短文本聚类;

相似文献

外文文献
中文文献
专利

1. Short text clustering based on Pitman-Yor process mixture model [J] . Qiang Jipeng, Li Yun, Yuan Yunhao, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2018,第7期

机译：基于Pitman-Yor Process混合模型的短文本聚类
2. A Dirichlet process biterm-based mixture model for short text stream clustering [J] . Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020,第5期

机译：基于Dirichlet处理Biterm的简短文本流群集的混合模型
3. A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering [J] . Jianhua Yin, Jianyong Wang SIGKDD explorations . 2014,第CDaROM期

机译：基于Dirichlet多项式混合模型的短文本聚类方法
4. Dirichlet Process Mixture Models based topic identification for short text streams [C] . Wang Chan, Yuan Caixia, Wang Xiaojie, 7th International Conference on Natural Language Processing and Knowledge Engineering . 2011

机译：基于Dirichlet过程混合模型的短文本流主题识别
5. Latent Multi-State Models for Non-Equidistant Longitudinal Observations with Finite and Infinite Mixture Model-Based Clustering [D] . Luo, Yu. 2019

机译：具有有限和无限混合模型的聚类非等距纵向观测的潜在多状态模型
6. From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering [O] . Sylvia Frühwirth-Schnatter, Gertraud Malsiner-Walli -1

机译：从这里到无限：基于模型的聚类中的稀疏有限与Dirichlet过程混合
7. TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering [O] . 2015

机译：TsDpmm：将先前的主题知识纳入用于文本聚类的Dirichlet过程混合模型
8. MCLUST Version 3: An R Package for Normal Mixture Modeling and Model- Based Clustering [R] . Fraley, C. , Raftery, A. E. 2006

机译：mCLUsT版本3：用于正常混合建模和基于模型的聚类的R包

Short text clustering based on Pitman-Yor process mixture model

摘要

著录项

相似文献

相关主题

期刊订阅