首页> 外文会议> >Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

【24h】

Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

机译：基于数据汇总的大型高维数据可伸缩聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notion of data space reduction, which finds high density areas, or dense cells, in the given feature space. The dense cells store summarized information of the data. A designated partitioning or hierarchical clustering algorithm can be used as the second step to find clusters based on the data summaries. Using Kmeans as an example, this paper presents GARDEN-Kmeans, which performs data space reduction using Gamma Region DENsity partition, and utilizes Kmeans to cluster the summarized information. The experimental study shows that GARDEN-Kmeans executes several orders of magnitude faster than basic Kmeans and the recursive bisection Kmeans algorithm of CLUTO, while producing comparable clustering quality

机译：以高维对大型数据集进行聚类是一项具有挑战性的数据挖掘任务。本文提出了有效执行此类任务的框架。它基于数据空间缩减的概念，该概念在给定的特征空间中找到高密度区域或密集单元。密集单元存储数据的汇总信息。指定的分区或分层聚类算法可以用作基于数据摘要查找聚类的第二步。以Kmeans为例，提出了GARDEN-Kmeans，它使用Gamma Region DENsity分区执行数据空间缩减，并利用Kmeans对汇总信息进行聚类。实验研究表明，GARDEN-Kmeans比CLUTO的基本Kmeans和递归二等分Kmeans算法执行速度快几个数量级，同时产生可比的聚类质量

著录项

来源
《》|2007年|456-461|共6页
会议地点
作者
Ying Lai; Orlandic; R.; Wai Gen Yee; Kulkarni; S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
data handling; data mining; pattern clustering; CLUTO; GARDEN-Kmeans; Gamma Region DENsity partition; data mining; data space reduction; data summaries; data summarization; hierarchical clustering algorithm; high dimensionality; large data set clustering; large high-d;

机译：数据处理;数据挖掘;模式聚类; CLUTO; GARDEN-Kmeans;伽马区域密度分区;数据挖掘;数据空间缩减;数据汇总;数据汇总;分层聚类算法;高维;大数据集聚类;大型高维;

相似文献

外文文献
中文文献
专利

1. Scalable model-based clustering for large databases based on data summarization [J] . Huidong Jin, Man-Leung Wong, Leung K.-S. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2005,第11期

机译：基于数据汇总的可扩展的基于模型的大型数据库集群
2. SCALABLE PARALLEL BIG DATA SUMMARIZATION TECHNIQUE BASED ON HIERARCHICAL CLUSTERING ALGORITHM [J] . VERONICA S. MOERTINI, MATTHEW ARIEL Journal of Theoretical and Applied Information Technology . 2020,第21期

机译：基于分层聚类算法的可扩展并行大数据摘要技术
3. Spectral clustering based on iterative optimization for large-scale and high-dimensional data [J] . Zhao Yang, Yuan Yuan, Nie Feiping, Neurocomputing . 2018,第NOVa27期

机译：基于迭代优化的大规模高维数据谱聚类
4. Scalable Clustering for Large High-Dimensional Data Based on Data Summarization [C] . Ying Lai, Ratko Orlandic, Wai Gen Yee, IEEE Symposium on Computational Intelligence and Data Mining . 2007

机译：基于数据摘要的大型高维数据可扩展聚类
5. High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products. [D] . Zhou, Dunke. 2012

机译：高维数据聚类和基于聚类的数据汇总产品的统计分析。
6. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions [O] . E Andres Houseman, Brock C Christensen, Ru-Fang Yeh, 2008

机译：DNA甲基化阵列数据的基于模型的聚类：针对β分布混合出现的高维数据的递归划分算法
7. Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high-dimensional objects How to cite this article: Finn WG, Carter KM, Raich R, Stoolman LM, Hero AO. Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high-dimensional objects. Cytometry Part B 2009; 76B: 1–7. [O] . Finn, William G., Carter, Kevin M., Raich, Raviv, 2009

机译：通过聚类统计流形分析临床流式细胞免疫表型数据：将流式细胞术数据作为高维物体处理如何引用本文：Finn WG，Carter Km，Raich R，stoolman Lm，Hero aO。通过聚类在统计流形上分析临床流式细胞免疫表型分析数据：将流式细胞术数据作为高维物体处理。细胞计数B部分2009; 76B：1-7。

Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

摘要

著录项

相似文献

相关主题

期刊订阅