A Multi Density-Based Clustering Algorithm for Data Stream with Noise

机译：一种基于多密度的噪声数据流聚类算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Density-based clustering can detect arbitrary shape clusters, handle outliers and do not need the number of clusters in advance. However, they cannot work properly in multi density environments. The existing multi density clustering algorithms have some problems in order to be applicable for data streams such as the need of whole data to perform clustering, two-pass clustering and high execution time. Data stream arrives continuously and they have to be processed in limited time and memory. Therefore, we need an algorithm to cluster data stream with different densities as well as to overcome the challenges in clustering data streams. In this paper, we introduce a Multi-Density clustering algorithm for data stream called MuDi-Stream. MuDi-Stream is an online-offline clustering algorithm, in which the online phase forms core-mini-clusters using a new proposed core distance and offline phase clusters the core-mini-clusters based on a density-based method. The new core distance called mini core distance is calculated based on the number of neighboring data points around the core. Therefore, the algorithm has different core distances for different clusters that leads to cover multi density environments. A novel pruning strategy is also used to filter out the real data from the noise by mapping the outliers in the grid. The grid structure keeps the neighbors of the data point to determine mini-core distance and remove noise effectively. Our performance study over synthetic data sets demonstrates effectiveness of our method.

机译：基于密度的聚类可以检测任意形状的聚类，可以处理异常值，并且不需要事先提供多个聚类。但是，它们无法在多密度环境中正常工作。现有的多密度聚类算法存在一些问题，以适用于数据流，例如需要整个数据来执行聚类，两次遍历聚类和高执行时间。数据流连续到达，因此必须在有限的时间和内存中进行处理。因此，我们需要一种算法来对具有不同密度的数据流进行聚类以及克服对数据流进行聚类的挑战。在本文中，我们介绍了一种针对数据流的多密度聚类算法，称为MuDi-Stream。 MuDi-Stream是一种在线-离线聚类算法，其中在线阶段使用新提出的核心距离形成核心-微型集群，而离线阶段则基于基于密度的方法对核心-微型集群进行聚类。新的核心距离称为迷你核心距离，是根据核心周围的相邻数据点的数量来计算的。因此，该算法对于不同的簇具有不同的核心距离，从而导致覆盖多密度环境。通过映射网格中的异常值，还使用了新颖的修剪策略从噪声中滤除实际数据。网格结构使数据点的邻居保持联系，从而确定最小核距离并有效地消除噪声。我们对综合数据集的性能研究证明了我们方法的有效性。

著录项

来源
《IEEE International Conference on Data Mining Workshops》|2013年|1105-1112|共8页
会议地点
作者
Amini Amineh; Saboohi Hadi; Wah Teh Ying;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Evolving data streams; core-mini-cluster; density-based clustering; mini-core distance; multi-density;

机译：不断发展的数据流;核心-迷你集群;基于密度的集群;迷你核心距离;多密度;

相似文献

外文文献
中文文献
专利

1. A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data [J] . Chen Jin-Yin, He Hui-Hao Information Sciences: An International Journal . 2016,第Null期

机译：针对混合数据自行确定簇中心的基于密度的快速数据流聚类算法
2. LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream [J] . Amineh Amini, Teh Ying Wah Journal of Computer and Communications . 2013,第5期

机译：LeaDen-Stream：不断发展的数据流上基于领导者密度的聚类算法
3. On Density-Based Data Streams Clustering Algorithms: A Survey [J] . Amineh Amini, Teh Ying Wah, Hadi Saboohi 计算机科学技术学报（英文版） . 2014,第001期

机译：基于密度的数据流聚类算法研究
4. A Multi Density-based Clustering Algorithm for Data Stream with Noise [C] . Amineh Amini, Hadi Saboohi, Teh Ying Wah IEEE International Conference on Data Mining Workshops . 2013

机译：一种多密度基聚类算法，用于噪声的数据流
5. Scalable frameworks and algorithms for cluster ensembles and clustering data streams. [D] . Hore, Prodip. 2007

机译：用于集群集成和集群数据流的可扩展框架和算法。
6. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream [O] . Amineh Amini, Hadi Saboohi, Teh Ying Wah, -1

机译：实时物联网流的基于密度的快速聚类算法
7. On Density-based Clustering Algorithms over Evolving Data Streams: A Summarization Paradigm [O] . Amineh Amini, Teh Ying Wah 2016

机译：基于密度的数据流演化聚类算法：概述范式

A Multi Density-Based Clustering Algorithm for Data Stream with Noise

摘要

著录项

相似文献

相关主题

期刊订阅