Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

机译：基于数据摘要的大型高维数据可扩展聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notion of data space reduction, which finds high density areas, or dense cells, in the given feature space. The dense cells store summarized information of the data. A designated partitioning or hierarchical clustering algorithm can be used as the second step to find clusters based on the data summaries. Using Kmeans as an example, this paper presents GARDEN-Kmeans, which performs data space reduction using Gamma Region DEN-sity partition, and utilizes Kmeans to cluster the summarized information. The experimental study shows that GARDEN-Kmeans executes several orders of magnitude faster than basic Kmeans and the recursive bisection Kmeans algorithm of CLUTO, while producing comparable clustering quality.

机译：具有高维度的大型数据集是一个具有挑战性的数据挖掘任务。本文介绍了有效地执行此类任务的框架。它基于在给定特征空间中找到高密度区域或致密电池的数据空间减少的概念。密集单元存储数据的总结信息。指定的分区或分层聚类算法可以用作基于数据摘要查找群集的第二步。使用Kmeans作为示例，本文介绍了使用Gamma Region Den-Sity分区执行数据空间减少的Garden-Kmeans，并利用Kmeans群集汇总信息。实验研究表明，Garden-Kmeans比基本邮件和克鲁托的递归二分支算法执行了几个数量级，同时产生了相当的聚类质量。

著录项

来源
《IEEE Symposium on Computational Intelligence and Data Mining》|2007年||共6页
会议地点
作者
Ying Lai; Ratko Orlandic; Wai Gen Yee; Sachin Kulkarni;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Scalable model-based clustering for large databases based on data summarization [J] . Huidong Jin, Man-Leung Wong, Leung K.-S. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2005,第11期

机译：基于数据汇总的可扩展的基于模型的大型数据库集群
2. SCALABLE PARALLEL BIG DATA SUMMARIZATION TECHNIQUE BASED ON HIERARCHICAL CLUSTERING ALGORITHM [J] . VERONICA S. MOERTINI, MATTHEW ARIEL Journal of Theoretical and Applied Information Technology . 2020,第21期

机译：基于分层聚类算法的可扩展并行大数据摘要技术
3. Spectral clustering based on iterative optimization for large-scale and high-dimensional data [J] . Zhao Yang, Yuan Yuan, Nie Feiping, Neurocomputing . 2018,第NOVa27期

机译：基于迭代优化的大规模高维数据谱聚类
4. Scalable Clustering for Large High-Dimensional Data Based on Data Summarization [C] . Ying Lai, Orlandic, R., . 2007

机译：基于数据汇总的大型高维数据可伸缩聚类
5. High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products. [D] . Zhou, Dunke. 2012

机译：高维数据聚类和基于聚类的数据汇总产品的统计分析。
6. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions [O] . E Andres Houseman, Brock C Christensen, Ru-Fang Yeh, 2008

机译：DNA甲基化阵列数据的基于模型的聚类：针对β分布混合出现的高维数据的递归划分算法
7. Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high-dimensional objects How to cite this article: Finn WG, Carter KM, Raich R, Stoolman LM, Hero AO. Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high-dimensional objects. Cytometry Part B 2009; 76B: 1–7. [O] . Finn, William G., Carter, Kevin M., Raich, Raviv, 2009

机译：通过聚类统计流形分析临床流式细胞免疫表型数据：将流式细胞术数据作为高维物体处理如何引用本文：Finn WG，Carter Km，Raich R，stoolman Lm，Hero aO。通过聚类在统计流形上分析临床流式细胞免疫表型分析数据：将流式细胞术数据作为高维物体处理。细胞计数B部分2009; 76B：1-7。

Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

摘要

著录项

相似文献

相关主题

期刊订阅