首页> 外文会议> >Scalable Clustering for Large High-Dimensional Data Based on Data Summarization
【24h】

Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

机译:基于数据汇总的大型高维数据可伸缩聚类

获取原文

摘要

Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notion of data space reduction, which finds high density areas, or dense cells, in the given feature space. The dense cells store summarized information of the data. A designated partitioning or hierarchical clustering algorithm can be used as the second step to find clusters based on the data summaries. Using Kmeans as an example, this paper presents GARDEN-Kmeans, which performs data space reduction using Gamma Region DENsity partition, and utilizes Kmeans to cluster the summarized information. The experimental study shows that GARDEN-Kmeans executes several orders of magnitude faster than basic Kmeans and the recursive bisection Kmeans algorithm of CLUTO, while producing comparable clustering quality
机译:以高维对大型数据集进行聚类是一项具有挑战性的数据挖掘任务。本文提出了有效执行此类任务的框架。它基于数据空间缩减的概念,该概念在给定的特征空间中找到高密度区域或密集单元。密集单元存储数据的汇总信息。指定的分区或分层聚类算法可以用作基于数据摘要查找聚类的第二步。以Kmeans为例,提出了GARDEN-Kmeans,它使用Gamma Region DENsity分区执行数据空间缩减,并利用Kmeans对汇总信息进行聚类。实验研究表明,GARDEN-Kmeans比CLUTO的基本Kmeans和递归二等分Kmeans算法执行速度快几个数量级,同时产生可比的聚类质量

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号