The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the
聚类问题在数据库文献中是众所周知的,因为它在诸如客户细分,分类和趋势分析等问题中的大量应用。不幸的是,由于点的固有稀疏性,所有已知算法都倾向于在高维空间中分解。在这样的高维空间中,并非所有维都可能与给定簇相关。处理此问题的一种方法是选择紧密相关的维,并在相应的子空间中找到聚类。传统的特征选择算法试图实现这一点。这种方法的缺点是,在典型的高维数据挖掘应用程序中,对于不同的维子集,不同的点集可能会更好地聚类。每个此类特定于群集的子空间中的维数也可能有所不同。因此,可能无法为所有群集找到单个小的尺寸子集。因此,我们讨论了聚类问题的一般化,称为
机译:使用子空间和投影聚类算法对高维数据进行聚类
机译:基于快速投影的大数据算法
机译:基于二次规划的聚类对应投影快速点匹配算法
机译:投影聚类的快速算法
机译:用于数据挖掘和可视化的快速概念性聚类算法。
机译:求解大型问题的基于投影梯度法的快速非负矩阵分解算法
机译:用于投影聚类的快速算法