首页> 外文会议>Exploiting the Knowledge Base: Applications of Rule Based Control >O-Cluster: scalable clustering of large high dimensional data sets
【24h】

O-Cluster: scalable clustering of large high dimensional data sets

机译:O-Cluster:大型高维数据集的可伸缩群集

获取原文
获取原文并翻译 | 示例

摘要

Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the "curse of dimensionality" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster's excellent scalability.
机译:聚类高维大数据集一直是聚类算法的挑战。许多最近开发的聚类算法已尝试解决具有大量记录和/或具有大量维的处理数据集的问题。我们将讨论现有算法在非常大的多维数据集上运行时的优点和局限性。为了同时克服“维数诅咒”和与大量数据相关的可伸缩性问题,我们提出了一种称为O-Cluster的新聚类算法。 O-Cluster将新颖的主动采样技术与轴平行分区策略相结合,以识别输入空间中高密度的连续区域。该方法在有限的内存缓冲区上运行,并且最多需要对数据进行一次扫描。我们展示了所获得的聚类解决方案的高质量,其对噪声的鲁棒性以及O-Cluster的出色可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号