O-Cluster: scalable clustering of large high dimensional data sets

机译：O-Cluster：大型高维数据集的可伸缩群集

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the "curse of dimensionality" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster's excellent scalability.

机译：聚类高维大数据集一直是聚类算法的挑战。许多最近开发的聚类算法已尝试解决具有大量记录和/或具有大量维的处理数据集的问题。我们将讨论现有算法在非常大的多维数据集上运行时的优点和局限性。为了同时克服“维数诅咒”和与大量数据相关的可伸缩性问题，我们提出了一种称为O-Cluster的新聚类算法。 O-Cluster将新颖的主动采样技术与轴平行分区策略相结合，以识别输入空间中高密度的连续区域。该方法在有限的内存缓冲区上运行，并且最多需要对数据进行一次扫描。我们展示了所获得的聚类解决方案的高质量，其对噪声的鲁棒性以及O-Cluster的出色可扩展性。

著录项

来源
《Exploiting the Knowledge Base: Applications of Rule Based Control》|1989年|p.290-297|共8页
会议地点
作者
Milenova B.L.; Campos M.M.;
展开▼
作者单位

Oracle Data Min. Technol., Burlington, MA, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-26 13:52:27

相似文献

外文文献
中文文献
专利

1. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Takitoh S, Fujii S, Mase Y, Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
2. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh, Shogo Fujii, Yoichi Mase, Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
3. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh14 Shogo Fujii14 Yoichi Mase14 Junichi Takasaki1 Toshimasa Yamazaki1 Yozo Ohnishi25 Masao Yanagisawa4 Yusuke Nakamura35 and Naoyuki Kamatani16 Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
4. O-Cluster: scalable clustering of large high dimensional data sets [C] . Milenova, B.L., Campos, . 2002

机译：O-Cluster：大型高维数据集的可伸缩集群
5. Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters. [D] . Dashti, Ali. 2013

机译：有效计算gpu群集上的大型高维数据集的k最近邻图。
6. Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters [O] . Ali Dashti, Ivan Komarov, Roshan M. D’Souza -1

机译：GPU群集上大型高维数据集的k最近邻图的高效计算
7. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [O] . S. Takitoh, S. Fujii, Y. Mase, 2007

机译：通过聚类方法的组合精确自动聚类，用于单核苷酸多态性基因分型的单核苷酸多态性基因分型：大规模真实数据的评估
8. Use of a Satellite Climatological Data Set to Infer Large Scale Three Dimensional Flow Characteristics [R] . Lerner, J. A. , Jedlovec, G. J. , Atkinson, R. J. 1998

机译：利用卫星气候资料集推断大尺度三维流动特征

O-Cluster: scalable clustering of large high dimensional data sets

摘要

著录项

相似文献

相关主题

期刊订阅