首页> 外文会议>Conference on Education Technology and Information Systems >A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights
【24h】

A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights

机译:具有Word分布权重的半监督文本聚类算法

获取原文

摘要

Semi-supervised text clustering, as a research branch of the text clustering, aims at employing limited priori knowledge to aid unsupervised text clustering process, and helping users get improved clustering results. Because labeled data are difficult, expensive and time-consuming to obtain, it is important to use the supervised information effectively to improve the performance of clustering significantly. This paper proposes a semi-supervised LDA text clustering algorithm based on the weights of word distribution (WWDLDA). By introducing the coefficients of word distribution obtained from labeled data, LDA model can be used in the field of semi-supervised clustering. In the process of clustering, coefficients always adjust the word distribution to change the clustering results. Our experimental results on real data sets show that the proposed semi-supervised text clustering algorithm can get better clustering results than constrained mixmnl, where mixmnl stands for multinomial model-based EM algorithm.
机译:半监督文本群集作为文本群集的研究分支,旨在雇用有限的先验知识来帮助无监督的文本聚类过程,并帮助用户获得改进的聚类结果。由于标记的数据难以获得昂贵且耗时,因此重要的是要有效地使用监督信息,以显着提高聚类的性能。本文提出了一种基于Word分布权重的半监控LDA文本聚类算法(WWDLDA)。通过引入从标记数据获得的Word分布的系数,LDA模型可以在半监督群集字段中使用。在聚类过程中,系数始终调整单词分布以更改聚类结果。我们对实际数据集的实验结果表明,所提出的半监督文本聚类算法可以获得比约束的MixMNL更好的聚类结果,其中MIXMNL代表基于多项式模型的EM算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号