【24h】

Fast Algorithms for Projected Clustering

机译:投影聚类的快速算法

获取原文

摘要

The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves. We develop an algorithmic framework for solving the projected clustering problem, and test its performance on synthetic data.
机译:群集问题在数据库文献中是众所周知的,因为诸如客户分割,分类和趋势分析等问题中的许多应用,因此由于点的固有稀疏性,所有已知的算法往往会在高维空间中分解。在这种高维空间中,并非所有维度都可以与给定群集相关。处理此操作的一种方法是选择密切相关的维度并在相应的子空间中找到群集。传统的特征选择算法尝试实现这一目标。这种方法的弱点是,在典型的高维数据挖掘应用中,对于不同的尺寸子集,可以更好地聚类。每个特定于聚类子空间中的尺寸的数量也可能变化。因此,对于所有簇来找到单个小尺寸的单个小尺寸可能是不可能的。因此,我们讨论了聚类问题的概括,称为投影聚类问题,其中所选择的尺寸的子集是特定于集群本身的。我们开发了一个算法框架,用于解决投影群集问题,并在合成数据上测试其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号