O-Cluster: scalable clustering of large high dimensional data sets

机译：O-Cluster：大型高维数据集的可伸缩集群

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the "curse of dimensionality" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster's excellent scalability.

机译：群集大数据集的高度维度一直是聚类算法的挑战。许多最近开发的聚类算法已经尝试通过大量记录和/或具有非常大的维度来解决处理数据集。我们在非常大的多维数据集上运行时，我们提供了现有算法的优缺点。为了同时克服与大量数据相关的“维度的诅咒”和与大量数据相关的可扩展性问题，我们提出了一种名为O-Cluster的新集群算法。 O-Cluster将新颖的主动采样技术与轴并行分区策略组合，以识别输入空间中的连续高密度区域。该方法在有限的存储器缓冲器上运行，并且最多需要通过数据进行一次扫描。我们展示了所获得的聚类解决方案的高质量，它们对噪声的鲁棒性和o簇的优异可扩展性。

著录项

来源
《》|2002年|p.290-297|共8页
会议地点
作者
Milenova; B.L.; Campos; M.M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
very large databases; pattern clustering; data mining; computational complexity; O-Cluster; scalable clustering; large high dimensional data sets; data handling; multidimensional data sets; scalability; complexity; data mining; active sampling techni;

机译：超大型数据库;模式聚类;数据挖掘;计算复杂度; O-Cluster;可伸缩聚类;大型高维数据集;数据处理;多维数据集;可扩展性;复杂性;数据挖掘;主动采样技术;

相似文献

外文文献
中文文献
专利

1. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Takitoh S, Fujii S, Mase Y, Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
2. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh, Shogo Fujii, Yoichi Mase, Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
3. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh14 Shogo Fujii14 Yoichi Mase14 Junichi Takasaki1 Toshimasa Yamazaki1 Yozo Ohnishi25 Masao Yanagisawa4 Yusuke Nakamura35 and Naoyuki Kamatani16 Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
4. O-cluster: scalable clustering of large high dimensional data sets [C] . Boriana L. Milenova, Marcos M. Campos IEEE International Conference on Data Mining . 2002

机译：O-Cluster：大型高维数据集的可扩展群集
5. Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters. [D] . Dashti, Ali. 2013

机译：有效计算gpu群集上的大型高维数据集的k最近邻图。
6. Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters [O] . Ali Dashti, Ivan Komarov, Roshan M. D’Souza -1

机译：GPU群集上大型高维数据集的k最近邻图的高效计算
7. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [O] . S. Takitoh, S. Fujii, Y. Mase, 2007

机译：通过聚类方法的组合精确自动聚类，用于单核苷酸多态性基因分型的单核苷酸多态性基因分型：大规模真实数据的评估
8. Use of a Satellite Climatological Data Set to Infer Large Scale Three Dimensional Flow Characteristics [R] . Lerner, J. A. , Jedlovec, G. J. , Atkinson, R. J. 1998

机译：利用卫星气候资料集推断大尺度三维流动特征

O-Cluster: scalable clustering of large high dimensional data sets

摘要

著录项

相似文献

相关主题

期刊订阅