CC-K-means: a candidate centres-based K-means algorithm for text data

Xuan Li; Yongquan Liang; Yuhao Cai

首页> 外文期刊>International Journal of Collaborative Intelligence >CC-K-means: a candidate centres-based K-means algorithm for text data

【24h】

CC-K-means: a candidate centres-based K-means algorithm for text data

机译：CC-K-means：基于候选中心的文本数据K-means算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

K-means algorithm, one of the clustering algorithms, is widely applied to solve clustering problems of various data thanks to its simplicity and efficiency. However, the randomness of selecting centre points of the traditional K-means algorithm results in some defects such as low-speed of convergence or instability of clustering results. To overcome the impact of high-dimension during text clustering, latent semantic index (LSI) model is firstly adopted to reduce the dimensions of feature vector, and then weighted adjusted cosine similarity is used to calculate the similarity between documents to obtain better clustering effects. The high-density candidate centre points are partly updated to get the final clustering centres on the basis of density in the process of finding clustering centres. Experiment results show that the proposed algorithm can accurately find representative and decentralised clustering centres, which express a better performance in clustering.

机译：K-means算法是聚类算法之一，由于其简单性和高效性而被广泛用于解决各种数据的聚类问题。然而，传统的K-means算法选择中心点的随机性导致了诸如收敛速度低或聚类结果不稳定等缺陷。为了克服高维文本聚类的影响，首先采用潜在语义索引（LSI）模型来减小特征向量的维数，然后利用加权调整余弦相似度来计算文档之间的相似度，以获得更好的聚类效果。在查找聚类中心的过程中，将根据密度部分更新高密度候选中心点，以获得最终的聚类中心。实验结果表明，该算法能够准确地找到代表性的和分散的聚类中心，表现出较好的聚类性能。

著录项

来源
《International Journal of Collaborative Intelligence》 |2016年第3期|189-204|共16页
作者
Xuan Li; Yongquan Liang; Yuhao Cai;
展开▼
作者单位

College of Information Science and Technology, Shandong University of Science and Technology, Qingdao, 266590, China;

College of Information Science and Technology, Shandong University of Science and Technology, Qingdao, 266590, China;

College of Information Science and Technology, Shandong University of Science and Technology, Qingdao, 266590, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
text clustering; LSI model; K-means algorithm; initial clustering centres; candidate centres;

机译：文本聚类;LSI型号;K-均值算法最初的聚类中心;候选人中心;

相似文献

外文文献
中文文献
专利

1. A solution to reconstruct cross-cut shredded text documents based on constrained seed K-means algorithm and ant colony algorithm [J] . Chen Junhua, Tian Miao, Qi Xingming, Expert Systems with Application . 2019,第AUGa期

机译：基于约束种子K-均值算法和蚁群算法的横切文本文档重构解决方案
2. A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering [J] . Mustafi D., Sahoo G. Soft computing: A fusion of foundations, methodologies and applications . 2019,第15期

机译：一种使用遗传算法的混合方法和差分演化启发式提高K均值算法的初始化与文本群集的应用
3. Comparison of Distributed K-Means and Distributed Fuzzy C-Means Algorithms for Text Clustering [J] . I Made Artha Agastya, Teguh Bharata Adji, Noor Akhmad Setiawan Communications in Science and Technology . 2017,第1期

机译：文本聚类的分布式K均值和分布式模糊C均值算法的比较
4. A Parallel K-Means Algorithm for High Dimensional Text Data [C] . Xiaolei Shan, Yanming Shen, Yuxin Wang IEEE International Conference on Consumer Electronics-Taiwan . 2018

机译：高维文本数据的并行K均值算法
5. Clustering educational digital library usage data: Comparisons of latent class analysis and K-means algorithms [D] . Xu, Beijie 2011

机译：聚集教育数字图书馆使用数据：潜在类别分析和K-means算法的比较
6. Balancing effort and benefit of K-means clustering algorithms in Big Data realms [O] . Joaquín Pérez-Ortega, Nelva Nely Almanza-Ortega, David Romero 2012

机译：大数据领域中K均值聚类算法的平衡工作和收益
7. Soil data clustering by using K-means and fuzzy K-means algorithm [O] . E. Hot, V. Popović-Bugarin 2016

机译：soil data clustering by using K-means and fuzzy K-means algorithm

CC-K-means: a candidate centres-based K-means algorithm for text data

摘要

著录项

相似文献

相关主题

期刊订阅