首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A feature group weighting method for subspace clustering of high-dimensional data
【24h】

A feature group weighting method for subspace clustering of high-dimensional data

机译:高维数据子空间聚类的特征组加权方法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.
机译:本文提出了一种对特征组和单个特征中的子空间加权的新方法,用于聚类高维数据。在这种方法中,根据高维数据的自然特征将其分为特征组。在聚类过程中引入了两种类型的权重,以同时识别每个聚类中要素组和单个要素的重要性。给出了一个新的优化模型来定义优化过程,并提出了一种新的聚类算法FG-k-means来优化该优化模型。新算法是对k均值的扩展,它增加了两个附加步骤来自动计算两种类型的子空间权重。提出了一种新的数据生成方法,以生成具有特征组和单个特征的子空间中的簇的高维数据。关于合成和真实数据的实验结果表明,在几乎所有实验中,FG-k-means算法均明显优于四种k-means类型算法,即k-means,W-k-means,LAC和EWKM。新算法对高维数据中通常存在的噪声和缺失值具有鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号