首页> 外文会议>IEEE Symposium on Computational Intelligence and Data Mining >Scalable Clustering for Large High-Dimensional Data Based on Data Summarization
【24h】

Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

机译:基于数据摘要的大型高维数据可扩展聚类

获取原文

摘要

Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notion of data space reduction, which finds high density areas, or dense cells, in the given feature space. The dense cells store summarized information of the data. A designated partitioning or hierarchical clustering algorithm can be used as the second step to find clusters based on the data summaries. Using Kmeans as an example, this paper presents GARDEN-Kmeans, which performs data space reduction using Gamma Region DEN-sity partition, and utilizes Kmeans to cluster the summarized information. The experimental study shows that GARDEN-Kmeans executes several orders of magnitude faster than basic Kmeans and the recursive bisection Kmeans algorithm of CLUTO, while producing comparable clustering quality.
机译:具有高维度的大型数据集是一个具有挑战性的数据挖掘任务。本文介绍了有效地执行此类任务的框架。它基于在给定特征空间中找到高密度区域或致密电池的数据空间减少的概念。密集单元存储数据的总结信息。指定的分区或分层聚类算法可以用作基于数据摘要查找群集的第二步。使用Kmeans作为示例,本文介绍了使用Gamma Region Den-Sity分区执行数据空间减少的Garden-Kmeans,并利用Kmeans群集汇总信息。实验研究表明,Garden-Kmeans比基本邮件和克鲁托的递归二分支算法执行了几个数量级,同时产生了相当的聚类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号