首页> 外文会议>IEEE International Conference on Big Data >Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data
【24h】

Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

机译:使用FP树对高维数据进行可扩展的自下而上的子空间聚类

获取原文

摘要

Subspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA sequences. However, existing algorithms have limitations in finding clusters in non-disjoint subspaces and scaling to large data, which impinge their applicability in areas such as bioinformatics and the Internet of Things. We aim to address such limitations by proposing a subspace clustering algorithm using a bottom-up strategy. Our algorithm first searches for base clusters in low dimensional subspaces. It then forms clusters in higher-dimensional subspaces using these base clusters, which we formulate as a frequent pattern mining problem. This formulation enables efficient search for clusters in higher-dimensional subspaces, which is done using FP-trees. The proposed algorithm is evaluated against traditional bottom-up clustering algorithms and state-of-the-art subspace clustering algorithms. The experimental results show that the proposed algorithm produces clusters with high accuracy, and scales well to large volumes of data. We also demonstrate the algorithm's performance using real-life ten genomic datasets.
机译:子空间群集旨在查找来自高维数据集的较低维子空间中存在的类似对象(群集)的组。它具有广泛的应用,例如分析高尺寸传感器数据或DNA序列。然而,现有算法具有在非脱节子空间中查找集群并缩放到大数据的限制,这在诸如生物信息学和物联网等地区之间的适用性。我们旨在通过使用自下而上策略提出子空间聚类算法来解决这些限制。我们的算法首先在低维子空间中搜索基本集群。然后,它使用这些基础集群在高维子空间中形成集群,我们将其作为频繁的模式挖掘问题制定。此配方使得能够有效地搜索高维子空间中的群集,这是使用FP-Trees完成的。该算法针对传统的自下而上聚类算法和最先进的子空间聚类算法进行评估。实验结果表明,该算法以高精度产生了集群,并符合大量数据。我们还使用现实生活中的十个基因组数据集展示了算法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号