【24h】

Fast algorithms for projected clustering

机译:投影聚类的快速算法

获取原文

摘要

The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves. We develop an algorithmic framework for solving the projected clustering problem, and test its performance on synthetic data.

机译:

聚类问题在数据库文献中是众所周知的,因为它在诸如客户细分,分类和趋势分析等问题中的大量应用。不幸的是,由于点的固有稀疏性,所有已知算法都倾向于在高维空间中分解。在这样的高维空间中,并非所有维都可能与给定簇相关。处理此问题的一种方法是选择紧密相关的维,并在相应的子空间中找到聚类。传统的特征选择算法试图实现这一点。这种方法的缺点是,在典型的高维数据挖掘应用程序中,对于不同的维子集,不同的点集可能会更好地聚类。每个此类特定于群集的子空间中的维数也可能有所不同。因此,可能无法为所有群集找到单个小的尺寸子集。因此,我们讨论了聚类问题的一般化,称为投影聚类问题,其中所选维的子集特定于聚类本身。我们开发了解决投影聚类问题的算法框架,并测试了其在综合数据上的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号