A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data

Hussain Syed Fawad; Haris Muhammad

首页> 外文期刊>Expert Systems with Application >A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data

【24h】

A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data

机译：针对稀疏，高维数据的基于k均值的共聚簇（kCC）算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The k-means algorithm is a widely used method that starts with an initial partitioning of the data and then iteratively converges towards the local solution by reducing the Sum of Squared Errors (SSE). It is known to suffer from the cluster center initialization problem and the iterative step simply (re-)labels the data points based on the initial partition. Most improvements to k-means proposed in the literature focus on the initialization step alone but make no attempt to guide the iterative convergence by exploiting statistical information from the data. Using higher order statistics (such as paths from random walks in a graph) and the duality in the data (as in co-clustering), for instance, are known ways to improve the clustering results. What is unique and significant in our proposed approach is that we embed these concepts into the k-means algorithm rather than just using them as an external distance measure and present a unified framework called the k-means based co-clustering (kCC) Algorithm. The initialization step has been modified to include multiple points to represent each cluster center such that points within a cluster are close together but are far from points representing other clusters. Moreover, neighborhood walk statistics is proposed as a semantic similarity technique for both cluster assignment and center re estimation in the iterative process. The effectiveness of the combined approach is evaluated on several standard data sets. Our results show that kCC performs better as compared to the baseline k-means and other state-of-the-art improvements. (C) 2018 Elsevier Ltd. All rights reserved.

机译：k均值算法是一种广泛使用的方法，该方法从对数据进行初始分区开始，然后通过减少平方误差和（SSE）迭代地收敛到局部解。已知遭受集群中心初始化问题的困扰，并且迭代步骤仅基于初始分区简单地（重新）标记数据点。文献中提出的对k-means的大多数改进都集中在初始化步骤上，但没有尝试通过利用来自数据的统计信息来指导迭代收敛。例如，使用高阶统计量（例如来自图形中随机游走的路径）和数据的对偶性（如在共聚中）是改善聚类结果的已知方法。在我们提出的方法中，唯一且有意义的是，我们将这些概念嵌入到k-means算法中，而不仅仅是将它们用作外部距离度量，并且提出了一个统一的框架，称为基于k-means的共聚（kCC）算法。初始化步骤已被修改为包括多个点，以表示每个群集中心，以使群集内的点靠得很近，但与表示其他群集的点相距较远。此外，在步行过程中，将邻域步行统计作为一种语义相似性技术用于聚类分配和中心重估计。在几种标准数据集上评估了组合方法的有效性。我们的结果表明，与基准k均值和其他最新改进相比，kCC的性能更好。（C）2018 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2019年第3期|20-34|共15页
作者
Hussain Syed Fawad; Haris Muhammad;
展开▼
作者单位

GIK Inst, MDS Lab, Topi, Khyber Pakhtunk, Pakistan|GIK Inst, Fac Comp Sci & Engn, Topi, Khyber Pakhtunk, Pakistan;

GIK Inst, MDS Lab, Topi, Khyber Pakhtunk, Pakistan|GIK Inst, Fac Comp Sci & Engn, Topi, Khyber Pakhtunk, Pakistan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering; K-means; Centroid initialization; Co-clustering; Semantic similarity;

机译：聚类;K-均值;质心初始化;共聚;语义相似度;

相似文献

外文文献
中文文献
专利

1. An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data [J] . Jing Liping, Ng Michael K., Huang Joshua Zhexue IEEE Transactions on Knowledge and Data Engineering . 2007,第8期

机译：高维稀疏数据子空间聚类的熵权k均值算法
2. A heuristic-based fuzzy co-clustering algorithm for categorization of high-dimensional data [J] . William-Chandra Tjhi, Lihui Chen Fuzzy sets and systems . 2008,第4期

机译：基于启发式的模糊共聚算法对高维数据的分类
3. Distance based k-means clustering algorithm for determining number of clusters for high dimensional data [J] . Alibuhtto M., Mahat N. Decision Science Letters . 2020,第1期

机译：基于距离的K均值聚类算法，用于确定高维数据的簇数
4. K-Means Parallel Acceleration for Sparse Data Dimensions on Flink [C] . Zihao Zeng, Kenli Li, Mingxing Duan, IEEE International Conference on High Performance Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems . 2019

机译：Flink上稀疏数据维度的K-Means并行加速
5. Efficient algorithms and software for mining sparse, high -dimensional data [D] . Yoon, Hankil 2000

机译：用于挖掘稀疏，高维数据的高效算法和软件
6. A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data [O] . Shuai Huang, Jing Li, Jieping Ye, -1

机译：基于高维数据的高斯贝叶斯网络识别的稀疏结构学习算法
7. Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering [O] . Chang, Xiangyu, Wang, Yu, Li, Rongjian, 2014

机译：稀疏K-means与$ \ ell _ {\ infty} / \ ell_0 $ penalty for High-Dimensional 数据聚类

A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data

摘要

著录项

相似文献

相关主题

期刊订阅