A Hybrid Algorithm for Web Document Clustering Based on Frequent Term Sets and k-Means

机译：基于频繁项集和k-Means的Web文档聚类混合算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to conquer the major challenges of current web document clustering, I.e. huge volume of documents, high dimensional process and understandability of the cluster, we propose a simple hybrid algorithm (SHDC) based on top-k frequent term sets and k-means. Top-k frequent term sets are used to produce k initial means, which are regarded as initial clusters and further refined by k-means. The final optimal clustering is returned by k-means while the understandable description of clustering is provided by k frequent term sets. Experimental results on two public datasets indicate that SHDC outperforms other two representative clustering algorithms (the farthest first k-means and random initial k-means) both on efficiency and effectiveness.

机译：为了克服当前网络文档集群的主要挑战，即大量的文档，高维的过程和群集的可理解性，我们提出了一种基于前k个频繁项集和k均值的简单混合算法（SHDC）。前k个频繁项集用于产生k个初始均值，这些均值被视为初始聚类并通过k均值进一步完善。最终的最佳聚类由k均值返回，而聚类的可理解的描述由k个频繁项集提供。在两个公共数据集上的实验结果表明，SHDC在效率和有效性方面均优于其他两个代表性的聚类算法（最远的第一个k均值和随机的初始k均值）。

著录项

来源
《International Workshop on DataBase Management and Application over Networks(DBMAN 2007); International Workshop on Emerging Trends of Web Technologies and Applications(WebETrends 2007; International Workshop on Process Aware Information Systems(PAIS 2007》|2007年|P.198-203|共6页
会议地点 Huang Shan(CN);Huang Shan(CN);Huang Shan(CN);Huang Shan(CN)
作者
Le Wang; Li Tian; Yan Jia; Weihong Han;
展开▼
作者单位

Computer School, National University of Defense Technology, Changsha, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词

相似文献

外文文献
中文文献
专利

1. MLK-Means - A Hybrid Machine Learning based K-Means Clustering Algorithms for Document Clustering [J] . P. PERUMAL, R. NEDUNCHEZHIAN WSEAS Transactions on Information Science and Applications . 2012,第7a9期

机译：MLK-Means-用于文档聚类的基于混合机器学习的K-Means聚类算法
2. A HYBRID APPROACH USING PSO AND K-MEANS FOR SEMANTIC CLUSTERING OF WEB DOCUMENTS [J] . J. AVANIJA, Dr.K. RAMAR Journal of web engineering . 2013,第3a4期

机译：基于PSO和K均值的Web文档语义聚类混合方法
3. Data Clustering of Web Documents using Simple K-means Algorithm [J] . R.Dhanalakshmi, Riju Thomas Australian Journal of Basic and Applied Sciences . 2015,第2015期

机译：使用简单K均值算法的Web文档数据聚类
4. A Hybrid Algorithm for Web Document Clustering Based on Frequent Term Sets and k-Means [C] . Le Wang, Li Tian, Yan Jia, International Workshop on DataBase Management and Application over Networks(DBMAN 2007) . 2007

机译：一种基于频繁术语集和k均值的Web文档聚类混合算法
5. Study of document clustering using the k-means algorithm. [D] . Gummuluru, Meghna Sharma. 2006

机译：使用k-means算法研究文档聚类。
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Document Clustering by Dynamic Hierarchical Algorithm Based on Fuzzy Set Type-ii From Frequent Itemset [O] . Musa, Saiful Bahri, Kaswar, Andi Baso, Supria, Supria, 2016

机译：基于频繁集的基于模糊集-ii型的动态层次算法的文档聚类

A Hybrid Algorithm for Web Document Clustering Based on Frequent Term Sets and k-Means

摘要

著录项

相似文献

相关主题

期刊订阅