首页> 外文期刊>BioData Mining >Soft document clustering using a novel graph covering approach
【24h】

Soft document clustering using a novel graph covering approach

机译:使用新颖的图形覆盖方法进行软文档聚类

获取原文
           

摘要

In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to perform a soft clustering as well as a hard clustering. The software is freely available on GitHub. The presented integer linear programming as well as the greedy approach for this N P $mathcal {NP}$ -complete problem lead to valuable results on random instances and some real-world data for different similarity measures. We could show that PS-Document Clustering is a remarkable approach to document clustering and opens the complete toolbox of graph theory to this field.
机译:在文本挖掘中,文档聚类描述将非结构化文档分配给聚类的工作,聚类反过来通常指主题。聚类在科学中广泛用于数据检索和组织。在本文中,我们介绍并讨论了一种新颖的图论方法,用于文档聚类及其在实际数据集上的应用。我们将显示众所周知的图划分为稳定集或集团的方法可以推广到伪稳定集或伪clicliques。这允许执行软群集以及硬群集。该软件可在GitHub上免费获得。针对此N P $ mathcal {NP} $-完全问题,提出的整数线性规划以及贪婪方法导致了随机实例和一些实际世界数据的不同相似性度量的有价值结果。我们可以证明PS-文档聚类是一种出色的文档聚类方法,它为该领域打开了图论的完整工具箱。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号