【24h】

A Hybrid Algorithm for Web Document Clustering Based on Frequent Term Sets and k-Means

机译:基于频繁项集和k-Means的Web文档聚类混合算法

获取原文
获取原文并翻译 | 示例

摘要

In order to conquer the major challenges of current web document clustering, I.e. huge volume of documents, high dimensional process and understandability of the cluster, we propose a simple hybrid algorithm (SHDC) based on top-k frequent term sets and k-means. Top-k frequent term sets are used to produce k initial means, which are regarded as initial clusters and further refined by k-means. The final optimal clustering is returned by k-means while the understandable description of clustering is provided by k frequent term sets. Experimental results on two public datasets indicate that SHDC outperforms other two representative clustering algorithms (the farthest first k-means and random initial k-means) both on efficiency and effectiveness.
机译:为了克服当前网络文档集群的主要挑战,即大量的文档,高维的过程和群集的可理解性,我们提出了一种基于前k个频繁项集和k均值的简单混合算法(SHDC)。前k个频繁项集用于产生k个初始均值,这些均值被视为初始聚类并通过k均值进一步完善。最终的最佳聚类由k均值返回,而聚类的可理解的描述由k个频繁项集提供。在两个公共数据集上的实验结果表明,SHDC在效率和有效性方面均优于其他两个代表性的聚类算法(最远的第一个k均值和随机的初始k均值)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号