...
首页> 外文期刊>Informatica: An International Journal of Computing and Informatics >A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function
【24h】

A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function

机译:基于主题建模和伪闭包函数的多标准文档聚类方法

获取原文

摘要

We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.
机译:我们在这项工作中解决了文档聚类的问题。我们的贡献基于潜在语义空间的结构分析提出了一种新颖的无监督聚类方法。空间中的每个文档都是代表主题分布的概率向量。计算群集中的文档成员资格时要考虑两个标准:文档中的主要主题(定性标准)和概率向量之间的距离度量(定量标准)。我们使用Pretopology理论对潜在语义空间进行结构分析,该理论使我们能够研究聚类数和所选质心在计算聚类之间的相似性中的作用。我们已将我们的方法应用于Twitter数据,并显示了与随机选择数量的聚类相比结果的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号