首页> 外文学位 >Subspace clustering methods for high dimensional data.
【24h】

Subspace clustering methods for high dimensional data.

机译:高维数据的子空间聚类方法。

获取原文
获取原文并翻译 | 示例

摘要

Prominent research has shown that increasing data dimensionality results in the loss of contrast in distances between data points. Thus, clustering algorithms measuring the similarity between data points based on all features/attributes of a data set tend to break down in high dimensional spaces. In addition, not all attributes of a data set may be relevant for the clustering analysis.;In this thesis, we propose three novel techniques that advance the state-of-the-art in the subspace and projected clustering field. First, we propose a projected clustering technique P3C that (1) depends on parameters that can be set without prior knowledge about the data; (2) can effectively discover low dimensional clusters embedded in high dimensional spaces; (3) can compute disjoint or overlapping clusters. Second, we propose two extensions that make P3C the first projected clustering technique that can be applied on both numerical and categorical data, sets. Third, we propose a novel problem formulation for subspace and projected clustering that aims at extracting non-redundant, axis-parallel, statistically significant regions from the data. The problem formulation is given as an optimization problem, for which exhaustive search is not a viable solution because of computational infeasibility. Therefore, we propose an approximation algorithm, STATPC, that has the same advantageous features as P3C, but, in addition, guarantees that its solution stands out in the data in a statistical sense, and it is not just an artefact of the method.;Motivated by these observations, it has been hypothesized that data points may form clusters only when a subset of the attributes, i.e., a subspace, is considered. Furthermore, data points may belong to different clusters in different subspaces. Subspace and projected clustering techniques search for clusters of points in subsets of attributes. Subspace clustering enumerates clusters of points in all subsets of attributes, typically producing many overlapping clusters. Projected clustering computes several disjoint clusters, plus outliers, so that each cluster exists in its own subset of attributes.
机译:杰出的研究表明,增加数据维数会导致数据点之间距离的对比度损失。因此,基于数据集的所有特征/属性测量数据点之间相似度的聚类算法往往会在高维空间中崩溃。此外,并非数据集的所有属性都可能与聚类分析相关。在本文中,我们提出了三种新颖的技术,它们在子空间和投影聚类领域中都具有最先进的技术。首先,我们提出一种投影聚类技术P3C,该技术(1)依赖于无需事先了解数据即可设置的参数; (2)可以有效地发现嵌入高维空间的低维簇; (3)可以计算不相交或重叠的群集。其次,我们提出了两个扩展,使P3C成为可应用于数值和分类数据集的第一个投影聚类技术。第三,我们为子空间和投影聚类提出了一种新颖的问题公式,其目的是从数据中提取非冗余的,与轴平行的,具有统计意义的区域。问题的提法是作为一个优化问题给出的,由于计算的不可行性,穷举搜索不是一个可行的解决方案。因此,我们提出了一种近似算法STATPC,该算法具有与P3C相同的优势,但除此之外,它还可以保证其解决方案在统计意义上在数据中脱颖而出,而不仅仅是该方法的伪像。出于这些观察的动机,已经假设只有在考虑属性的子集(即子空间)时,数据点才可能形成聚类。此外,数据点可能属于不同子空间中的不同群集。子空间和投影聚类技术在属性子集中搜索点的聚类。子空间聚类枚举属性的所有子集中的点的聚类,通常会产生许多重叠的聚类。投影聚类计算多个不相交的聚类,再加上离群值,因此每个聚类都存在于其自己的属性子集中。

著录项

  • 作者

    Moise, Gabriela.;

  • 作者单位

    University of Alberta (Canada).;

  • 授予单位 University of Alberta (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 132 p.
  • 总页数 132
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 老年病学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号