Contraction Clustering (Raster) A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

机译：压缩聚类（栅格）：一种基于大数据的基于恒定内存和线性时间的密度聚类算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering methods are infeasible due to their memory requirements or runtime complexity. contraction clustering (Raster) is a linear-time algorithm for identifying density-based clusters. Its coefficient is negligible as it depends neither on input size nor the number of clusters. Its memory requirements are constant. Consequently, Raster is suitable for big data applications where the size of the data may be huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. Our algorithm is extremely fast. In single-threaded execution on a contemporary workstation, it clusters ten million points in less than 20 s-when using a slow interpreted programming language like Python. Furthermore, Raster is easily parallelizable.

机译：聚类是用于分析和分组相似对象的基本数据挖掘工具。但是，在大数据应用程序中，由于其内存需求或运行时复杂性，许多群集方法是不可行的。收缩聚类（Raster）是一种线性时间算法，用于识别基于密度的聚类。它的系数可以忽略不计，因为它既不取决于输入大小也不取决于簇的数量。它的内存要求是恒定的。因此，Raster适用于数据量可能很大的大数据应用程序。它包括两个步骤：（1）将对象投射到图块上的收缩步骤；（2）将图块分组为簇的凝聚步骤。我们的算法非常快。在当代工作站上的单线程执行中，当使用慢速解释型编程语言（如Python）时，它可以在不到20秒的时间内聚集一千万个点。此外，Raster易于并行化。

著录项

来源
《International workshop on machine learning, optimization, and big data》|2017年|63-75|共13页
会议地点 Volterra(IT)
作者
Gregor Ulm; Emil Gustavsson; Mats Jirstrand;
展开▼
作者单位

Fraunhofer-Chalmers Research Centre for Industrial Mathematics Chalmers Science Park 412 88 Gothenburg Sweden;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algorithms; Big data; Machine learning; Unsupervised learning; Clustering;

机译：算法；大数据;机器学习；无监督学习；聚类;

相似文献

外文文献
中文文献
专利

1. Cluster Analysis on High-Dimensional Data: A Comparison of Density-based Clustering Algorithms [J] . Aina Musdholifah, Siti Zaiton Mohd Hashim Australian Journal of Basic and Applied Sciences . 2013,第2013期

机译：高维数据的聚类分析：基于密度的聚类算法的比较
2. Novel density-based and hierarchical density-based clustering algorithms for uncertain data [J] . Zhang Xianchao, Liu Han, Zhang Xiaotong Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：基于新的基于密度和分层密度的基于分层密度的不确定数据集群算法
3. A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data [J] . Chen Jin-Yin, He Hui-Hao Information Sciences: An International Journal . 2016,第Null期

机译：针对混合数据自行确定簇中心的基于密度的快速数据流聚类算法
4. Contraction Clustering (RASTER) A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time [C] . Gregor Ulm, Emil Gustavsson, Mats Jirstrand International Workshop on Machine Learning, Optimization, and Big Data . 2018

机译：收缩聚类（光栅）在恒定内存中基于密度的聚类的大数据算法和线性时间
5. Clustering algorithms for time series gene expression in microarray data. [D] . Zhang, Guilin. 2012

机译：微阵列数据中时间序列基因表达的聚类算法。
6. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream [O] . Amineh Amini, Hadi Saboohi, Teh Ying Wah, -1

机译：实时物联网流的基于密度的快速聚类算法
7. Real Time Density-Based Clustering (RTDBC) Algorithm for Big Data [O] . Dr. B. Ravi Prasad 2017

机译：基于实时密度的聚类（RTDBC）大数据算法

Contraction Clustering (Raster) A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅