首页> 外文会议> >O-Cluster: scalable clustering of large high dimensional data sets
【24h】

O-Cluster: scalable clustering of large high dimensional data sets

机译:O-Cluster:大型高维数据集的可伸缩集群

获取原文

摘要

Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the "curse of dimensionality" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster's excellent scalability.
机译:群集大数据集的高度维度一直是聚类算法的挑战。许多最近开发的聚类算法已经尝试通过大量记录和/或具有非常大的维度来解决处理数据集。我们在非常大的多维数据集上运行时,我们提供了现有算法的优缺点。为了同时克服与大量数据相关的“维度的诅咒”和与大量数据相关的可扩展性问题,我们提出了一种名为O-Cluster的新集群算法。 O-Cluster将新颖的主动采样技术与轴并行分区策略组合,以识别输入空间中的连续高密度区域。该方法在有限的存储器缓冲器上运行,并且最多需要通过数据进行一次扫描。我们展示了所获得的聚类解决方案的高质量,它们对噪声的鲁棒性和o簇的优异可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号