首页> 外文会议>International workshop on machine learning, optimization, and big data >Contraction Clustering (Raster) A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time
【24h】

Contraction Clustering (Raster) A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

机译:压缩聚类(栅格):一种基于大数据的基于恒定内存和线性时间的密度聚类算法

获取原文

摘要

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering methods are infeasible due to their memory requirements or runtime complexity. contraction clustering (Raster) is a linear-time algorithm for identifying density-based clusters. Its coefficient is negligible as it depends neither on input size nor the number of clusters. Its memory requirements are constant. Consequently, Raster is suitable for big data applications where the size of the data may be huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. Our algorithm is extremely fast. In single-threaded execution on a contemporary workstation, it clusters ten million points in less than 20 s-when using a slow interpreted programming language like Python. Furthermore, Raster is easily parallelizable.
机译:聚类是用于分析和分组相似对象的基本数据挖掘工具。但是,在大数据应用程序中,由于其内存需求或运行时复杂性,许多群集方法是不可行的。收缩聚类(Raster)是一种线性时间算法,用于识别基于密度的聚类。它的系数可以忽略不计,因为它既不取决于输入大小也不取决于簇的数量。它的内存要求是恒定的。因此,Raster适用于数据量可能很大的大数据应用程序。它包括两个步骤:(1)将对象投射到图块上的收缩步骤;(2)将图块分组为簇的凝聚步骤。我们的算法非常快。在当代工作站上的单线程执行中,当使用慢速解释型编程语言(如Python)时,它可以在不到20秒的时间内聚集一千万个点。此外,Raster易于并行化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号